Hyperparameters for reproduce the result

thanks for your attention, you can pay attention to BS, I think, here is my result: {"Acc_SA": 66.897, "F1_SA": 65.847, "mode": "test"} precision recall f1-score support

 neutral  0.7567185 0.8519108 0.8014981      1256
surprise  0.5673759 0.5693950 0.5683837       281
    fear  0.3194444 0.4600000 0.3770492        50
     sad  0.5784314 0.2836538 0.3806452       208
  joyful  0.6695906 0.5696517 0.6155914       402
 disgust  0.5263158 0.2941176 0.3773585        68
   angry  0.5138889 0.5362319 0.5248227       345

accuracy                      0.6689655      2610

macro avg 0.5616808 0.5092801 0.5207641 2610 weighted avg 0.6622274 0.6689655 0.6584736 2610

I find my result on my sever, details as followed: --------------config.json--------------------- { "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "rms_norm_eps": 1e-05, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.30.2", "use_cache": true, "vocab_size": 32000 } ------------------deepspeed_config.json---------------------------- { "train_batch_size": 16, "gradient_accumulation_steps": 8, "wall_clock_breakdown": false, "gradient_clipping": 1.0, "steps_per_print": 100, "fp16": { "enabled": true, "auto_cast": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "amp": { "enabled": false, "opt_level": "O2" }, "bfloat16": { "enabled": false }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 100000000.0, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 100000000.0, "contiguous_gradients": true, "sub_group_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1000000000.0, "stage3_max_reuse_distance": 1000000000.0, "stage3_gather_fp16_weights_on_model_save": true, "mics_shard_size": 8, "mics_hierarchical_params_gather": false, "offload_optimizer": { "device": "none" }, "offload_param": { "device": "none" } }, "zero_allow_untested_optimizer": true, "data_efficiency": { "enabled": true, "seed": 42, "data_sampling": { "curriculum_learning": {} }, "data_routing": {} }, "data_sampling": { "enabled": true, "num_workers": 8 } }

LIN-SHANG / InstructERC

Hyperparameters for reproduce the result #8