Closed La-SilverLand closed 9 months ago
thanks for your attention, you can pay attention to BS, I think, here is my result: {"Acc_SA": 66.897, "F1_SA": 65.847, "mode": "test"} precision recall f1-score support
neutral 0.7567185 0.8519108 0.8014981 1256
surprise 0.5673759 0.5693950 0.5683837 281
fear 0.3194444 0.4600000 0.3770492 50
sad 0.5784314 0.2836538 0.3806452 208
joyful 0.6695906 0.5696517 0.6155914 402
disgust 0.5263158 0.2941176 0.3773585 68
angry 0.5138889 0.5362319 0.5248227 345
accuracy 0.6689655 2610
macro avg 0.5616808 0.5092801 0.5207641 2610 weighted avg 0.6622274 0.6689655 0.6584736 2610
I find my result on my sever, details as followed: --------------config.json--------------------- { "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "rms_norm_eps": 1e-05, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.30.2", "use_cache": true, "vocab_size": 32000 } ------------------deepspeed_config.json---------------------------- { "train_batch_size": 16, "gradient_accumulation_steps": 8, "wall_clock_breakdown": false, "gradient_clipping": 1.0, "steps_per_print": 100, "fp16": { "enabled": true, "auto_cast": "auto", "loss_scale": 0, "initial_scale_power": 16, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, "amp": { "enabled": false, "opt_level": "O2" }, "bfloat16": { "enabled": false }, "zero_optimization": { "stage": 2, "allgather_partitions": true, "allgather_bucket_size": 100000000.0, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 100000000.0, "contiguous_gradients": true, "sub_group_size": "auto", "stage3_prefetch_bucket_size": "auto", "stage3_param_persistence_threshold": "auto", "stage3_max_live_parameters": 1000000000.0, "stage3_max_reuse_distance": 1000000000.0, "stage3_gather_fp16_weights_on_model_save": true, "mics_shard_size": 8, "mics_hierarchical_params_gather": false, "offload_optimizer": { "device": "none" }, "offload_param": { "device": "none" } }, "zero_allow_untested_optimizer": true, "data_efficiency": { "enabled": true, "seed": 42, "data_sampling": { "curriculum_learning": {} }, "data_routing": {} }, "data_sampling": { "enabled": true, "num_workers": 8 } }
Hi, i've tried the script train_and_inference_Plain.sh to reproduce the result on MELD using LLaMA2 weighted F1 reported in the paper VS weighted F1 i reproduced 65.84 vs 64.7
are the hyperparameters by default in the script and all the other scripts what you used for your results, or do I need to tune them anyway ?
for your reference i just followed what's in the script num_train_epochs=6 BS=16 the Lora settings are kept as what they are LR=2e-4 lora_dim=16, lora_alpha=16, lora_dropout=0.05, lora_module_name='q_proj,k_proj,v_proj,query_key_value' the defualt linear lr scheduler and warmup ratio is used and the random seed is as default seed=42