djz233 / D-DGCN

source code of Orders Are Unwanted: Dynamic Deep Graph Convolutional Network for Personality Detection (AAAI2023)
24 stars 3 forks source link

Can you specify how to use your checkpoint when testing? #4

Closed qiweijian closed 1 year ago

qiweijian commented 1 year ago

When I use your checkpoint best_f1_dggcn_kaggle_321.pth , there is a missing key 'pretrain_models.embeddings.position_ids' and it seems that there shows no code computing f1.

So I turn Line #200 in scr/main.py into model.load_state_dict(new_state_dict, strict=False)

And add the following F1 computing at the end of test loop

from sklearn.metrics import f1_score

        f1s = []
        labels_collection = [(out_label_ids1, preds1, 'I/E'), 
                        (out_label_ids2, preds2, 'S/N'), 
                        (out_label_ids3, preds3, 'T/F'), 
                        (out_label_ids4, preds4, 'P/J')]

        for out_label_ids, preds, label_name in labels_collection:
            current_f1 = f1_score(out_label_ids, preds)
            print(f'{label_name} F1: {current_f1}')
            f1s.append(current_f1)

        print(f'Average F1: {np.mean(f1s)}')

The output I get is

I/E F1: 0.46200607902735563
S/N F1: 0.34715025906735747
T/F F1: 0.7782747603833866
P/J F1: 0.5531574740207833
Average F1: 0.5351471431247208
qiweijian commented 1 year ago

My bad, I forgot to set the f1score to macro f1

f1_score(out_label_ids, preds, average='macro')

and the result is

I/E F1: 0.6680585160428386
S/N F1: 0.6327190984052741
T/F F1: 0.7980612647061289
P/J F1: 0.6506210984344566
Average F1: 0.6873649943971746
TaoYang225 commented 1 year ago

For the test step, your can modify the main.py:

from trainer import train, get_labels, test

and after line 203

        new_state_dict = OrderedDict()
        for k,v in model_state_dict.items():
            name = k[7:]  # remove 'module'
            new_state_dict[name] = v
        model.load_state_dict(new_state_dict)

        _, _, ave_f1, _ = test(args, test_dataset , model)

The test results are recorded into 'output1/test_result.txt'

We get the results:

1:  acc = 0.8005763688760807
-----------------
1:  f1 = 0.7022590833521805
-----------------
2:  acc = 0.8564841498559078
-----------------
2:  f1 = 0.6688497296849814
-----------------
3:  acc = 0.813256484149856
------------------
3:  f1 = 0.8112473910275524
------------------
4:  acc = 0.7095100864553314
------------------
4:  f1 = 0.6812029154161818
------------------
test_ave_f1= 0.715889779870224

Here are hyper-parameter settings: 07/07/2023 07:32:57 - INFO - __main__ - hyperparameter = Namespace(adam_epsilon=1e-06, all_gpu_eval_batch_size=32, all_gpu_train_batch_size=8, alpha_learning_rate=0.01, d_model=768, device=device(type='cuda'), dropout=0.2, final_hidden_size=128, gcn_dropout=0.2, gcn_hidden_size=768, gcn_mem_dim=64, gcn_num_layers=2, gm_learning_rate=1e-05, gradient_accumulation_steps=1, l0=False, learning_rate=1e-05, logging_steps=25, max_alpha=100, max_grad_norm=1.0, max_len=70, max_post=50, max_steps=-1, model_dir='../scr4/bert-base-cased', no_dart=False, no_special_node=False, num_classes=2, num_mlps=2, num_train_epochs=30.0, option='test', other_learning_rate=0.001, output_dir='output1', pretrain_type='bert', seed=321, single_hop=False, task='kaggle')