LIN-SHANG / InstructERC

The offical realization of InstructERC
121 stars 7 forks source link

Hyperparameters for reproduce the result #8

Closed La-SilverLand closed 9 months ago

La-SilverLand commented 9 months ago

Hi, i've tried the script train_and_inference_Plain.sh to reproduce the result on MELD using LLaMA2 weighted F1 reported in the paper VS weighted F1 i reproduced 65.84 vs 64.7

are the hyperparameters by default in the script and all the other scripts what you used for your results, or do I need to tune them anyway ?

for your reference i just followed what's in the script num_train_epochs=6 BS=16 the Lora settings are kept as what they are LR=2e-4 lora_dim=16, lora_alpha=16, lora_dropout=0.05, lora_module_name='q_proj,k_proj,v_proj,query_key_value' the defualt linear lr scheduler and warmup ratio is used and the random seed is as default seed=42

LIN-SHANG commented 9 months ago

thanks for your attention, you can pay attention to BS, I think, here is my result: {"Acc_SA": 66.897, "F1_SA": 65.847, "mode": "test"} precision recall f1-score support

 neutral  0.7567185 0.8519108 0.8014981      1256
surprise  0.5673759 0.5693950 0.5683837       281
    fear  0.3194444 0.4600000 0.3770492        50
     sad  0.5784314 0.2836538 0.3806452       208
  joyful  0.6695906 0.5696517 0.6155914       402
 disgust  0.5263158 0.2941176 0.3773585        68
   angry  0.5138889 0.5362319 0.5248227       345

accuracy                      0.6689655      2610

macro avg 0.5616808 0.5092801 0.5207641 2610 weighted avg 0.6622274 0.6689655 0.6584736 2610

I find my result on my sever, details as followed: --------------config.json--------------------- { "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "rms_norm_eps": 1e-05, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.30.2", "use_cache": true, "vocab_size": 32000 } ------------------deepspeed_config.json---------------------------- { "train_batch_size": 16, "gradient_accumulation_steps": 8, "wall_clock_breakdown": false, "gradient_clipping": 1.0, "steps_per_print": 100, "fp16": { "enabled": true, "auto_cast": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "amp": { "enabled": false, "opt_level": "O2" }, "bfloat16": { "enabled": false }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 100000000.0, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 100000000.0, "contiguous_gradients": true, "sub_group_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1000000000.0, "stage3_max_reuse_distance": 1000000000.0, "stage3_gather_fp16_weights_on_model_save": true, "mics_shard_size": 8, "mics_hierarchical_params_gather": false, "offload_optimizer": { "device": "none" }, "offload_param": { "device": "none" } }, "zero_allow_untested_optimizer": true, "data_efficiency": { "enabled": true, "seed": 42, "data_sampling": { "curriculum_learning": {} }, "data_routing": {} }, "data_sampling": { "enabled": true, "num_workers": 8 } }