It seems BERT-only is better? Thank you very much.

Set --dim = 0

----- Configure ----- 
  cfg_ds: cola 
  stop_words: False 
  Vocab GCN_hidden_dim: vocab_size -> 128 -> 0 
  Learning_rate0: 8e-06 weight_decay: 0.01 
  Loss_criterion cle softmax_before_mse True 
  Dropout: 0.2 Run_adj: pmi gcn_act_func: Relu 
  MAX_SEQ_LENGTH: 200 
  perform_metrics_str: ['weighted avg', 'f1-score'] 
  model_file_save: VGCN_BERT0_model_cola_cle_sw0.pt

----- Prepare data set -----
  Load/shuffle/seperate cola dataset, and vocabulary graph adjacent matrix
  Zero ratio(?>66%) for vocab adj 0th: 99.60957849
  Train_classes count: [2400, 5724]
  Num examples for train = 8124 , after weight sample: 8128
  Num examples for validate = 427
  Batch size = 16
  Num steps = 4572

--------------------------------------------------------------
Epoch:8 completed, Total Train Loss:23.38433140423149, Valid Loss:23.596614667214453, Spend 21.966997488339743m 

**Optimization Finished!,Total spend: 21.966997877756754
**Valid weighted F1: 82.322 at 0 epoch.
**Test weighted F1 when valid best: 81.805

Set --dim = 16


----- Configure ----- 
  cfg_ds: cola 
  stop_words: False 
  Vocab GCN_hidden_dim: vocab_size -> 128 -> 16 
  Learning_rate0: 8e-06 weight_decay: 0.01 
  Loss_criterion cle softmax_before_mse True 
  Dropout: 0.2 Run_adj: pmi gcn_act_func: Relu 
  MAX_SEQ_LENGTH: 216 
  perform_metrics_str: ['weighted avg', 'f1-score'] 
  model_file_save: VGCN_BERT16_model_cola_cle_sw0.pt

----- Prepare data set -----
  Load/shuffle/seperate cola dataset, and vocabulary graph adjacent matrix
  Zero ratio(?>66%) for vocab adj 0th: 99.60957849
  Train_classes count: [2400, 5724]
  Num examples for train = 8124 , after weight sample: 8128
  Num examples for validate = 427
  Batch size = 16
  Num steps = 4572

--------------------------------------------------------------
Epoch:8 completed, Total Train Loss:24.964138203300536, Valid Loss:23.089254705701023, Spend 35.125716678301494m 

**Optimization Finished!,Total spend: 35.12571721076965
**Valid weighted F1: 82.051 at 1 epoch.
**Test weighted F1 when valid best: 81.760

@Louis-udm @Xiang-Pan Thank you very much.

Louis-udm / VGCN-BERT

It seems BERT-only is better? Thank you very much. #4