Closed guotong1988 closed 4 years ago
Set --dim = 0
--dim
----- Configure ----- cfg_ds: cola stop_words: False Vocab GCN_hidden_dim: vocab_size -> 128 -> 0 Learning_rate0: 8e-06 weight_decay: 0.01 Loss_criterion cle softmax_before_mse True Dropout: 0.2 Run_adj: pmi gcn_act_func: Relu MAX_SEQ_LENGTH: 200 perform_metrics_str: ['weighted avg', 'f1-score'] model_file_save: VGCN_BERT0_model_cola_cle_sw0.pt ----- Prepare data set ----- Load/shuffle/seperate cola dataset, and vocabulary graph adjacent matrix Zero ratio(?>66%) for vocab adj 0th: 99.60957849 Train_classes count: [2400, 5724] Num examples for train = 8124 , after weight sample: 8128 Num examples for validate = 427 Batch size = 16 Num steps = 4572 -------------------------------------------------------------- Epoch:8 completed, Total Train Loss:23.38433140423149, Valid Loss:23.596614667214453, Spend 21.966997488339743m **Optimization Finished!,Total spend: 21.966997877756754 **Valid weighted F1: 82.322 at 0 epoch. **Test weighted F1 when valid best: 81.805
Set --dim = 16
----- Configure ----- cfg_ds: cola stop_words: False Vocab GCN_hidden_dim: vocab_size -> 128 -> 16 Learning_rate0: 8e-06 weight_decay: 0.01 Loss_criterion cle softmax_before_mse True Dropout: 0.2 Run_adj: pmi gcn_act_func: Relu MAX_SEQ_LENGTH: 216 perform_metrics_str: ['weighted avg', 'f1-score'] model_file_save: VGCN_BERT16_model_cola_cle_sw0.pt ----- Prepare data set ----- Load/shuffle/seperate cola dataset, and vocabulary graph adjacent matrix Zero ratio(?>66%) for vocab adj 0th: 99.60957849 Train_classes count: [2400, 5724] Num examples for train = 8124 , after weight sample: 8128 Num examples for validate = 427 Batch size = 16 Num steps = 4572 -------------------------------------------------------------- Epoch:8 completed, Total Train Loss:24.964138203300536, Valid Loss:23.089254705701023, Spend 35.125716678301494m **Optimization Finished!,Total spend: 35.12571721076965 **Valid weighted F1: 82.051 at 1 epoch. **Test weighted F1 when valid best: 81.760
@Louis-udm @Xiang-Pan Thank you very much.
You can try to change some parameters. I am not sure that our parameters are the best, and the performance of the BERT model in different environments varies greatly even if using the same hyper parameters.
Set
--dim
= 0Set
--dim
= 16@Louis-udm @Xiang-Pan Thank you very much.