Could you please give an example of the Bert running code? Because I run it wrong and get abnormal results by below command,

python main.py --bert --pretrained_bert bert-base-uncased --cuda 0 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target

What do you mean by abnormal results? I modified your command by changing main.py to src/main.py and managed to run it successfully at the root position of the repo.

Hi, for your fastText embedding, I got similar results in your Tab. 1. However, I run the Bert results on the HuffPost dataset, either 5way1shot or 5way5shot are highly different from your Tab. 2. For 5way1shot, I use the command python main.py --bert --pretrained_bert bert-base-uncased --cuda 2 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target The results are below, 21/12/06 11:39:32, acc mean 0.2830, std 0.0528 21/12/06 11:40:22, acc mean 0.2808, std 0.0513

I also tried --way 5 --shot 1 where all other parameters are the same as the above command. The results are below, 21/12/06 10:48:35, acc mean 0.2899, std 0.0545 21/12/06 10:49:24, acc mean 0.2814, std 0.0517 I am now checking my code and it will be greatly appreciated if you can share a Bert command on HuffPost.

Also, when I run the Bert for FewRel by the command python main.py --bert --pretrained_bert bert-base-uncased --cuda 0 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset fewrel --data_path data/fewrel_bert_uncase.json --n_train_class 65 --n_val_class 5 --n_test_class 10 --auxiliary pos --meta_iwf --meta_w_target Below bug was met, File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 141, in forward scale = self.compute_score(data, ebd) File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 209, in compute_score hidden = self.rnn(x, data['text_len']) File "/home/jfhe/anaconda3/envs/fewdoc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 83, in forward sort_text, sort_len, invert_order, num_zero = self._sort_tensor(input=text, lengths=text_len) File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 39, in _sort_tensor nonzero_idx = sorted_lengths.nonzero() RuntimeError: CUDA error: device-side assert triggered Killed If you can provide any hint about the command mistakes or something else, great appreciation!

I have run your command on Huffpost and the results seem normal to me (I have attached the 5 shot exp below). What is your PyTorch version and Transformer version?

(signature) ➜  Distributional-Signatures git:(master) ✗ python src/main.py --bert --pretrained_bert bert-base-uncased --cuda 6 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target --seed=330

Parameters:
        AUXILIARY=[]
        BERT=True
        BERT_CACHE_DIR=None
        CLASSIFIER=r2d2
        CLIP_GRAD=None
        CUDA=6
        DATA_PATH=data/huffpost_bert_uncase.json
        DATASET=huffpost
        DROPOUT=0.1
        EMBEDDING=meta
        FINETUNE_EBD=False
        FINETUNE_EPISODES=10
        FINETUNE_LOSS_TYPE=softmax
        FINETUNE_MAXEPOCHS=5000
        FINETUNE_SPLIT=0.8
        INDUCT_ATT_DIM=64
        INDUCT_HIDDEN_DIM=100
        INDUCT_ITER=3
        INDUCT_RNN_DIM=128
        LR=0.001
        LRD2_NUM_ITERS=5
        MAML=False
        META_EBD=False
        META_IDF=False
        META_IWF=True
        META_TARGET_ENTROPY=False
        META_W_TARGET=True
        META_W_TARGET_LAM=1
        MODE=train
        N_TEST_CLASS=16
        N_TRAIN_CLASS=20
        N_VAL_CLASS=5
        N_WORKERS=10
        NOTQDM=False
        PATIENCE=20
        PRETRAINED_BERT=bert-base-uncased
        QUERY=25
        RESULT_PATH=
        SAVE=False
        SEED=330
        SHOT=1
        SNAPSHOT=
        TEST_EPISODES=1000
        TRAIN_EPISODES=100
        TRAIN_EPOCHS=1000
        VAL_EPISODES=100
        WAY=5
        WORD_VECTOR=wiki.en.vec
        WV_PATH=./

    (Credit: Maija Haavisto)                        /
                                 _,.------....___,.' ',.-.
                              ,-'          _,.--'        |
                            ,'         _.-'              .
                           /   ,     ,'                   `
                          .   /     /                     ``.
                          |  |     .                       \.\
                ____      |___._.  |       __               \ `.
              .'    `---''       ``'-.--''`  \               .  \
             .  ,            __               `              |   .
             `,'         ,-''  .               \             |    L
            ,'          '    _.'                -._          /    |
           ,`-.    ,'.   `--'                      >.      ,'     |
          . .'\'   `-'       __    ,  ,-.         /  `.__.-      ,'
          ||:, .           ,'  ;  /  / \ `        `.    .      .'/
          j|:D  \          `--'  ' ,'_  . .         `.__, \   , /
         / L:_  |                 .  '' :_;                `.'.'
         .    '''                  ''''''                    V
          `.                                 .    `.   _,..  `
            `,_   .    .                _,-'/    .. `,'   __  `
             ) \`._        ___....----''  ,'   .'  \ |   '  \  .
            /   `. '`-.--''         _,' ,'     `---' |    `./  |
           .   _  `'''--.._____..--'   ,             '         |
           | .' `. `-.                /-.           /          ,
           | `._.'    `,_            ;  /         ,'          .
          .'          /| `-.        . ,'         ,           ,
          '-.__ __ _,','    '`-..___;-...__   ,.'\ ____.___.'
          `'^--'..'   '-`-^-''--    `-^-'`.'''''''`.,^.`.--' mh

21/12/06 14:54:59: Loading data
21/12/06 14:54:59: Class balance:
{19: 900, 4: 900, 5: 900, 8: 900, 1: 900, 13: 900, 31: 900, 16: 900, 36: 900, 39: 900, 14: 900, 11: 900, 23: 900, 17: 900, 7: 900, 21: 900, 26: 900, 12: 900, 18: 900, 37: 90
0, 6: 900, 22: 900, 40: 900, 15: 900, 29: 900, 10: 900, 35: 900, 38: 900, 9: 900, 25: 900, 30: 900, 20: 900, 3: 900, 27: 900, 24: 900, 34: 900, 33: 900, 32: 900, 0: 900, 2:
900, 28: 900}
21/12/06 14:54:59: Avg len: 13.077235772357724
21/12/06 14:54:59: Loading word vectors
21/12/06 14:55:00: Total num. of words: 9376, word vector dimension: 300
21/12/06 14:55:00: Num. of out-of-vocabulary words(they are initialized to zeros): 2496
21/12/06 14:55:00: #train 18000, #val 4500, #test 14400
load complete
pre compute complete
21/12/06 14:55:18, Loading pretrained bert
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cl
s.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.b
ias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequence
Classification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification
 model from a BertForSequenceClassification model).
wordebd complete
avg ebd complete
cuda complete
meta w target complete
21/12/06 14:55:37, Building embedding
21/12/06 14:55:37, Loading pretrained bert
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cl
s.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.b
ias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequence
Classification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification
 model from a BertForSequenceClassification model).
21/12/06 14:55:39, Building augmented embedding
21/12/06 14:55:39, Building embedding
21/12/06 14:55:41: Building classifier
21/12/06 14:55:41, Start training
21/12/06 14:56:23, ep  0, train acc: 0.3415 ± 0.0610
21/12/06 14:56:43, ep  0, val   acc: 0.3478 ± 0.0662, train stats ebd_grad: 0.0042, clf_grad: 0.0808
21/12/06 14:56:43, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/0
21/12/06 14:57:34, ep  1, val   acc: 0.3582 ± 0.0721, train stats ebd_grad: 0.0176, clf_grad: 0.0909
21/12/06 14:57:34, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/1
21/12/06 14:58:22, ep  2, val   acc: 0.3620 ± 0.0786, train stats ebd_grad: 0.0445, clf_grad: 0.1232
21/12/06 14:58:22, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/2
21/12/06 14:59:12, ep  3, val   acc: 0.3722 ± 0.0746, train stats ebd_grad: 0.0903, clf_grad: 0.1127
21/12/06 14:59:12, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/3
21/12/06 15:00:01, ep  4, val   acc: 0.3702 ± 0.0729, train stats ebd_grad: 0.1075, clf_grad: 0.1257
21/12/06 15:00:48, ep  5, val   acc: 0.3848 ± 0.0773, train stats ebd_grad: 0.1343, clf_grad: 0.1374
21/12/06 15:00:48, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/5
21/12/06 15:01:36, ep  6, val   acc: 0.3816 ± 0.0714, train stats ebd_grad: 0.0972, clf_grad: 0.1173
21/12/06 15:02:22, ep  7, val   acc: 0.3826 ± 0.0676, train stats ebd_grad: 0.0860, clf_grad: 0.1277
21/12/06 15:02:59, ep  8, val   acc: 0.3770 ± 0.0696, train stats ebd_grad: 0.0896, clf_grad: 0.1577
21/12/06 15:03:35, ep  9, val   acc: 0.3881 ± 0.0728, train stats ebd_grad: 0.1019, clf_grad: 0.1478
21/12/06 15:03:35, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/9
21/12/06 15:04:16, ep 10, train acc: 0.3588 ± 0.0736
21/12/06 15:04:36, ep 10, val   acc: 0.3789 ± 0.0705, train stats ebd_grad: 0.0970, clf_grad: 0.1476
21/12/06 15:05:18, ep 11, val   acc: 0.3936 ± 0.0734, train stats ebd_grad: 0.0925, clf_grad: 0.1372
21/12/06 15:05:18, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/11
21/12/06 15:06:07, ep 12, val   acc: 0.3806 ± 0.0716, train stats ebd_grad: 0.0954, clf_grad: 0.1458
21/12/06 15:06:52, ep 13, val   acc: 0.3746 ± 0.0720, train stats ebd_grad: 0.1013, clf_grad: 0.1549
21/12/06 15:07:36, ep 14, val   acc: 0.3960 ± 0.0682, train stats ebd_grad: 0.0992, clf_grad: 0.1657
21/12/06 15:07:36, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/14
21/12/06 15:08:24, ep 15, val   acc: 0.3891 ± 0.0696, train stats ebd_grad: 0.0996, clf_grad: 0.1708
21/12/06 15:09:09, ep 16, val   acc: 0.3871 ± 0.0660, train stats ebd_grad: 0.0936, clf_grad: 0.1548
21/12/06 15:09:56, ep 17, val   acc: 0.3785 ± 0.0699, train stats ebd_grad: 0.1122, clf_grad: 0.1609
21/12/06 15:10:40, ep 18, val   acc: 0.3890 ± 0.0723, train stats ebd_grad: 0.1035, clf_grad: 0.1297
21/12/06 15:11:21, ep 19, val   acc: 0.3794 ± 0.0626, train stats ebd_grad: 0.1239, clf_grad: 0.1440
21/12/06 15:11:58, ep 20, train acc: 0.3702 ± 0.0713
21/12/06 15:12:16, ep 20, val   acc: 0.3966 ± 0.0712, train stats ebd_grad: 0.1064, clf_grad: 0.1563
21/12/06 15:12:16, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/20
21/12/06 15:12:56, ep 21, val   acc: 0.3905 ± 0.0776, train stats ebd_grad: 0.1189, clf_grad: 0.1409
21/12/06 15:13:35, ep 22, val   acc: 0.3920 ± 0.0750, train stats ebd_grad: 0.1326, clf_grad: 0.1669
21/12/06 15:14:12, ep 23, val   acc: 0.3753 ± 0.0579, train stats ebd_grad: 0.1180, clf_grad: 0.1364
21/12/06 15:14:49, ep 24, val   acc: 0.3894 ± 0.0674, train stats ebd_grad: 0.1167, clf_grad: 0.1416
21/12/06 15:15:27, ep 25, val   acc: 0.3922 ± 0.0657, train stats ebd_grad: 0.1340, clf_grad: 0.1449
21/12/06 15:16:09, ep 26, val   acc: 0.3902 ± 0.0684, train stats ebd_grad: 0.1159, clf_grad: 0.1463
21/12/06 15:16:54, ep 27, val   acc: 0.3910 ± 0.0656, train stats ebd_grad: 0.1312, clf_grad: 0.1441
21/12/06 15:17:38, ep 28, val   acc: 0.4038 ± 0.0662, train stats ebd_grad: 0.1245, clf_grad: 0.1548
21/12/06 15:17:38, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/28
21/12/06 15:18:25, ep 29, val   acc: 0.3814 ± 0.0651, train stats ebd_grad: 0.1462, clf_grad: 0.1355
21/12/06 15:19:11, ep 30, train acc: 0.3717 ± 0.0698
21/12/06 15:19:32, ep 30, val   acc: 0.3909 ± 0.0702, train stats ebd_grad: 0.1387, clf_grad: 0.1573
21/12/06 15:20:17, ep 31, val   acc: 0.3894 ± 0.0743, train stats ebd_grad: 0.1211, clf_grad: 0.1370
21/12/06 15:21:02, ep 32, val   acc: 0.3869 ± 0.0660, train stats ebd_grad: 0.1346, clf_grad: 0.1571
21/12/06 15:21:47, ep 33, val   acc: 0.3946 ± 0.0669, train stats ebd_grad: 0.1273, clf_grad: 0.1450
21/12/06 15:22:24, ep 34, val   acc: 0.4006 ± 0.0780, train stats ebd_grad: 0.1420, clf_grad: 0.1509
21/12/06 15:23:06, ep 35, val   acc: 0.3990 ± 0.0664, train stats ebd_grad: 0.1579, clf_grad: 0.1534
21/12/06 15:23:49, ep 36, val   acc: 0.3943 ± 0.0688, train stats ebd_grad: 0.1749, clf_grad: 0.1552
21/12/06 15:24:31, ep 37, val   acc: 0.3918 ± 0.0709, train stats ebd_grad: 0.1825, clf_grad: 0.1556
21/12/06 15:25:14, ep 38, val   acc: 0.3995 ± 0.0720, train stats ebd_grad: 0.1879, clf_grad: 0.1406
21/12/06 15:25:58, ep 39, val   acc: 0.4014 ± 0.0723, train stats ebd_grad: 0.1808, clf_grad: 0.1652
21/12/06 15:26:40, ep 40, train acc: 0.3672 ± 0.0711
21/12/06 15:27:02, ep 40, val   acc: 0.3967 ± 0.0693, train stats ebd_grad: 0.1665, clf_grad: 0.1443
21/12/06 15:27:50, ep 41, val   acc: 0.4033 ± 0.0722, train stats ebd_grad: 0.1685, clf_grad: 0.1637
21/12/06 15:28:38, ep 42, val   acc: 0.4010 ± 0.0745, train stats ebd_grad: 0.1529, clf_grad: 0.1562
21/12/06 15:29:23, ep 43, val   acc: 0.4131 ± 0.0660, train stats ebd_grad: 0.1764, clf_grad: 0.1668
21/12/06 15:29:23, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/43
21/12/06 15:30:18, ep 44, val   acc: 0.3986 ± 0.0675, train stats ebd_grad: 0.1770, clf_grad: 0.1645
21/12/06 15:31:04, ep 45, val   acc: 0.3989 ± 0.0664, train stats ebd_grad: 0.1586, clf_grad: 0.1817
21/12/06 15:31:45, ep 46, val   acc: 0.3942 ± 0.0632, train stats ebd_grad: 0.1796, clf_grad: 0.1733
21/12/06 15:32:27, ep 47, val   acc: 0.3974 ± 0.0763, train stats ebd_grad: 0.1576, clf_grad: 0.1586
21/12/06 15:33:04, ep 48, val   acc: 0.4040 ± 0.0753, train stats ebd_grad: 0.2087, clf_grad: 0.1733
21/12/06 15:33:40, ep 49, val   acc: 0.4090 ± 0.0699, train stats ebd_grad: 0.1966, clf_grad: 0.1630
21/12/06 15:34:16, ep 50, train acc: 0.3844 ± 0.0695
21/12/06 15:34:36, ep 50, val   acc: 0.4122 ± 0.0694, train stats ebd_grad: 0.1784, clf_grad: 0.1618
21/12/06 15:35:22, ep 51, val   acc: 0.4117 ± 0.0699, train stats ebd_grad: 0.1993, clf_grad: 0.1581
21/12/06 15:36:09, ep 52, val   acc: 0.4019 ± 0.0732, train stats ebd_grad: 0.2204, clf_grad: 0.1500
21/12/06 15:36:55, ep 53, val   acc: 0.4110 ± 0.0613, train stats ebd_grad: 0.2330, clf_grad: 0.1697
21/12/06 15:37:37, ep 54, val   acc: 0.3985 ± 0.0745, train stats ebd_grad: 0.1669, clf_grad: 0.1831
21/12/06 15:38:24, ep 55, val   acc: 0.4121 ± 0.0632, train stats ebd_grad: 0.2189, clf_grad: 0.1468
21/12/06 15:39:11, ep 56, val   acc: 0.3960 ± 0.0736, train stats ebd_grad: 0.1914, clf_grad: 0.1578
21/12/06 15:39:55, ep 57, val   acc: 0.4097 ± 0.0655, train stats ebd_grad: 0.2203, clf_grad: 0.1586
21/12/06 15:40:43, ep 58, val   acc: 0.3933 ± 0.0676, train stats ebd_grad: 0.2062, clf_grad: 0.1673
21/12/06 15:41:28, ep 59, val   acc: 0.4187 ± 0.0730, train stats ebd_grad: 0.2370, clf_grad: 0.1709
21/12/06 15:41:28, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/59
21/12/06 15:42:18, ep 60, train acc: 0.3824 ± 0.0674
21/12/06 15:42:39, ep 60, val   acc: 0.4126 ± 0.0697, train stats ebd_grad: 0.2011, clf_grad: 0.1745
21/12/06 15:43:17, ep 61, val   acc: 0.4232 ± 0.0726, train stats ebd_grad: 0.2257, clf_grad: 0.1619
21/12/06 15:43:17, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/61
21/12/06 15:43:58, ep 62, val   acc: 0.4006 ± 0.0796, train stats ebd_grad: 0.2037, clf_grad: 0.1836
21/12/06 15:44:35, ep 63, val   acc: 0.4193 ± 0.0713, train stats ebd_grad: 0.1966, clf_grad: 0.1663
21/12/06 15:45:10, ep 64, val   acc: 0.4126 ± 0.0625, train stats ebd_grad: 0.2124, clf_grad: 0.1740
21/12/06 15:45:45, ep 65, val   acc: 0.4139 ± 0.0615, train stats ebd_grad: 0.2151, clf_grad: 0.1699
21/12/06 15:46:20, ep 66, val   acc: 0.4063 ± 0.0718, train stats ebd_grad: 0.2230, clf_grad: 0.1452
21/12/06 15:46:56, ep 67, val   acc: 0.3982 ± 0.0674, train stats ebd_grad: 0.3468, clf_grad: 0.1641
21/12/06 15:47:31, ep 68, val   acc: 0.3854 ± 0.0745, train stats ebd_grad: 0.2237, clf_grad: 0.1766
21/12/06 15:48:05, ep 69, val   acc: 0.4020 ± 0.0741, train stats ebd_grad: 0.2251, clf_grad: 0.1631
21/12/06 15:48:40, ep 70, train acc: 0.3848 ± 0.0730
21/12/06 15:48:56, ep 70, val   acc: 0.4230 ± 0.0716, train stats ebd_grad: 0.2938, clf_grad: 0.2011
21/12/06 15:49:31, ep 71, val   acc: 0.4070 ± 0.0673, train stats ebd_grad: 0.2547, clf_grad: 0.1610
21/12/06 15:50:07, ep 72, val   acc: 0.3917 ± 0.0759, train stats ebd_grad: 0.2113, clf_grad: 0.1771
21/12/06 15:50:42, ep 73, val   acc: 0.4164 ± 0.0712, train stats ebd_grad: 0.2110, clf_grad: 0.1754
21/12/06 15:51:24, ep 74, val   acc: 0.4017 ± 0.0714, train stats ebd_grad: 0.2340, clf_grad: 0.1472
21/12/06 15:52:10, ep 75, val   acc: 0.4163 ± 0.0656, train stats ebd_grad: 0.2432, clf_grad: 0.1800
21/12/06 15:52:58, ep 76, val   acc: 0.4064 ± 0.0713, train stats ebd_grad: 0.2991, clf_grad: 0.1894
21/12/06 15:53:39, ep 77, val   acc: 0.4104 ± 0.0608, train stats ebd_grad: 0.2814, clf_grad: 0.1593
21/12/06 15:54:24, ep 78, val   acc: 0.4149 ± 0.0689, train stats ebd_grad: 0.2390, clf_grad: 0.1909
21/12/06 15:55:08, ep 79, val   acc: 0.3998 ± 0.0786, train stats ebd_grad: 0.3145, clf_grad: 0.1773
21/12/06 15:55:51, ep 80, train acc: 0.3872 ± 0.0793
21/12/06 15:56:10, ep 80, val   acc: 0.4154 ± 0.0672, train stats ebd_grad: 0.2835, clf_grad: 0.1800
21/12/06 15:56:53, ep 81, val   acc: 0.3992 ± 0.0779, train stats ebd_grad: 0.2684, clf_grad: 0.1770
21/12/06 15:56:53, End of training. Restore the best weights
21/12/06 16:00:32, acc mean  0.4166, std  0.0835
(signature) ➜  Distributional-Signatures git:(master) ✗

Hi, thanks a lot for your help.

My PyTorch version is 1.2.0, and my Transformers version is 4.12.0. I list it as below, `

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
blas 1.0 openblas
blessings 1.7 pypi_0 pypi boto3 1.19.6 pypi_0 pypi botocore 1.22.6 pypi_0 pypi brotlipy 0.7.0 py37h27cfd23_1003
ca-certificates 2021.10.8 ha878542_0 conda-forge certifi 2021.10.8 py37h89c1867_1 conda-forge cffi 1.14.6 py37h400218f_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.0.3 pypi_0 pypi colorama 0.4.4 pyh9f0ad1d_0 conda-forge cryptography 35.0.0 py37hd23ed53_0
cudatoolkit 10.0.130 0
filelock 3.3.1 pypi_0 pypi freetype 2.10.4 h5ab3b9f_0
giflib 5.2.1 h7b6447c_0
gpustat 0.6.0 pypi_0 pypi huggingface-hub 0.0.19 pypi_0 pypi idna 3.2 pyhd3eb1b0_0
importlib-metadata 4.8.1 pypi_0 pypi intel-openmp 2021.3.0 h06a4308_3350
jmespath 0.10.0 pypi_0 pypi joblib 1.1.0 pyhd8ed1ab_0 conda-forge jpeg 9d h7f8727e_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libblas 3.9.0 1_h6e990d7_netlib conda-forge libcblas 3.9.0 3_h893e4fe_netlib conda-forge libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
liblapack 3.9.0 3_h893e4fe_netlib conda-forge libopenblas 0.3.13 h4367d64_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.2.0 h85742a9_0
libwebp 1.2.0 h89dd481_0
libwebp-base 1.2.0 h27cfd23_0
lz4-c 1.9.3 h295c915_1
mkl 2021.3.0 h06a4308_520
mkl-service 2.4.0 py37h7f8727e_0
ncurses 6.2 he6710b0_1
ninja 1.10.2 hff7bd54_1
numpy 1.21.3 pypi_0 pypi nvidia-ml-py3 7.352.0 pypi_0 pypi olefile 0.46 py37_0
openssl 1.1.1l h7f8727e_0
packaging 21.0 pypi_0 pypi pillow 8.4.0 py37h5aabda8_0
pip 21.2.2 py37h06a4308_0
psutil 5.8.0 pypi_0 pypi pycparser 2.20 py_2
pyopenssl 21.0.0 pyhd3eb1b0_1
pyparsing 3.0.3 pypi_0 pypi pysocks 1.7.1 py37_1
python 3.7.11 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi python_abi 3.7 2_cp37m conda-forge pytorch 1.2.0 py3.7_cuda10.0.130_cudnn7.6.2_0 pytorch pytorch-transformers 1.2.0 pypi_0 pypi pyyaml 6.0 pypi_0 pypi readline 8.1 h27cfd23_0
regex 2021.10.23 pypi_0 pypi requests 2.26.0 pyhd3eb1b0_0
s3transfer 0.5.0 pypi_0 pypi sacremoses 0.0.46 pypi_0 pypi scikit-learn 0.24.2 py37h18a542f_0 conda-forge scipy 1.5.3 py37h8911b10_0 conda-forge sentencepiece 0.1.96 pypi_0 pypi setuptools 58.0.4 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
termcolor 1.1.0 py37h06a4308_1
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge tk 8.6.11 h1ccaba5_0
tokenizers 0.10.3 pypi_0 pypi torchtext 0.4.0 pyhb384e40_1 pytorch torchvision 0.4.0 py37_cu100 pytorch tqdm 4.62.3 pyhd8ed1ab_0 conda-forge transformers 4.12.0 pypi_0 pypi typing-extensions 3.10.0.2 pypi_0 pypi urllib3 1.26.7 pyhd3eb1b0_0
wheel 0.37.0 pyhd3eb1b0_1
xz 5.2.5 h7b6447c_0
zipp 3.6.0 pypi_0 pypi zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0 `

Thanks a lot for your help again.

Below is what I have run by my own environment, I used the clean and new code by recloning from your Github just a moment ago.

And may I know your environment? It seems that the issue is related to the environment.

` (fewdoc2) jfhe@desktop:~/Documents/MountHe/jfhe/projects/Distributional-Signatures/src$ python main.py --bert --pretrained_bert bert-base-uncased --cuda 2 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target --seed=330

Parameters: AUXILIARY=[] BERT=True BERT_CACHE_DIR=None CLASSIFIER=r2d2 CLIP_GRAD=None CUDA=2 DATA_PATH=data/huffpost_bert_uncase.json DATASET=huffpost DROPOUT=0.1 EMBEDDING=meta FINETUNE_EBD=False FINETUNE_EPISODES=10 FINETUNE_LOSS_TYPE=softmax FINETUNE_MAXEPOCHS=5000 FINETUNE_SPLIT=0.8 INDUCT_ATT_DIM=64 INDUCT_HIDDEN_DIM=100 INDUCT_ITER=3 INDUCT_RNN_DIM=128 LR=0.001 LRD2_NUM_ITERS=5 MAML=False META_EBD=False META_IDF=False META_IWF=True META_TARGET_ENTROPY=False META_W_TARGET=True META_W_TARGET_LAM=1 MODE=train N_TEST_CLASS=16 N_TRAIN_CLASS=20 N_VAL_CLASS=5 N_WORKERS=10 NOTQDM=False PATIENCE=20 PRETRAINED_BERT=bert-base-uncased QUERY=25 RESULT_PATH= SAVE=False SEED=330 SHOT=1 SNAPSHOT= TEST_EPISODES=1000 TRAIN_EPISODES=100 TRAIN_EPOCHS=1000 VAL_EPISODES=100 WAY=5 WORD_VECTOR=wiki.en.vec WV_PATH=./

(Credit: Maija Haavisto)                        /
                             _,.------....___,.' ',.-.
                          ,-'          _,.--'        |
                        ,'         _.-'              .
                       /   ,     ,'                   `
                      .   /     /                     ``.
                      |  |     .                       \.\
            ____      |___._.  |       __               \ `.
          .'    `---''       ``'-.--''`  \               .  \
         .  ,            __               `              |   .
         `,'         ,-''  .               \             |    L
        ,'          '    _.'                -._          /    |
       ,`-.    ,'.   `--'                      >.      ,'     |
      . .'\'   `-'       __    ,  ,-.         /  `.__.-      ,'
      ||:, .           ,'  ;  /  / \ `        `.    .      .'/
      j|:D  \          `--'  ' ,'_  . .         `.__, \   , /
     / L:_  |                 .  '' :_;                `.'.'
     .    '''                  ''''''                    V
      `.                                 .    `.   _,..  `
        `,_   .    .                _,-'/    .. `,'   __  `
         ) \`._        ___....----''  ,'   .'  \ |   '  \  .
        /   `. '`-.--''         _,' ,'     `---' |    `./  |
       .   _  `'''--.._____..--'   ,             '         |
       | .' `. `-.                /-.           /          ,
       | `._.'    `,_            ;  /         ,'          .
      .'          /| `-.        . ,'         ,           ,
      '-.__ __ _,','    '`-..___;-...__   ,.'\ ____.___.'
      `'^--'..'   '-`-^-''--    `-^-'`.'''''''`.,^.`.--' mh

21/12/06 16:21:04: Loading data 21/12/06 16:21:04: Class balance: {19: 900, 4: 900, 5: 900, 8: 900, 1: 900, 13: 900, 31: 900, 16: 900, 36: 900, 39: 900, 14: 900, 11: 900, 23: 900, 17: 900, 7: 900, 21: 900, 26: 900, 12: 900, 18: 900, 37: 900, 6: 900, 22: 900, 40: 900, 15: 900, 29: 900, 10: 900, 35: 900, 38: 900, 9: 900, 25: 900, 30: 900, 20: 900, 3: 900, 27: 900, 24: 900, 34: 900, 33: 900, 32: 900, 0: 900, 2: 900, 28: 900} 21/12/06 16:21:04: Avg len: 13.077235772357724 21/12/06 16:21:04: Loading word vectors 21/12/06 16:21:19: Total num. of words: 9376, word vector dimension: 300 21/12/06 16:21:19: Num. of out-of-vocabulary words(they are initialized to zeros): 1586 21/12/06 16:21:19: #train 18000, #val 4500, #test 14400 21/12/06 16:21:21, Loading pretrained bert Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 21/12/06 16:21:24, Building embedding 21/12/06 16:21:24, Loading pretrained bert Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']
This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 21/12/06 16:21:25, Building augmented embedding 21/12/06 16:21:25, Building embedding 21/12/06 16:21:26: Building classifier 21/12/06 16:21:26, Start training 21/12/06 16:21:42, ep 0, train acc: 0.2845 ± 0.0517
21/12/06 16:21:49, ep 0, val acc: 0.2749 ± 0.0524, train stats ebd_grad: 0.0014, clf_grad: 0.0335 21/12/06 16:21:49, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/0 21/12/06 16:22:05, ep 1, val acc: 0.2838 ± 0.0516, train stats ebd_grad: 0.0047, clf_grad: 0.0306 21/12/06 16:22:05, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/1 21/12/06 16:22:21, ep 2, val acc: 0.2785 ± 0.0542, train stats ebd_grad: 0.0053, clf_grad: 0.0375 21/12/06 16:22:36, ep 3, val acc: 0.2810 ± 0.0518, train stats ebd_grad: 0.0076, clf_grad: 0.0369 21/12/06 16:22:51, ep 4, val acc: 0.2893 ± 0.0519, train stats ebd_grad: 0.0067, clf_grad: 0.0412 21/12/06 16:22:51, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/4 21/12/06 16:23:07, ep 5, val acc: 0.2843 ± 0.0538, train stats ebd_grad: 0.0069, clf_grad: 0.0445 21/12/06 16:23:23, ep 6, val acc: 0.2827 ± 0.0513, train stats ebd_grad: 0.0069, clf_grad: 0.0386 21/12/06 16:23:38, ep 7, val acc: 0.2895 ± 0.0549, train stats ebd_grad: 0.0067, clf_grad: 0.0480 21/12/06 16:23:38, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/7 21/12/06 16:23:54, ep 8, val acc: 0.2837 ± 0.0547, train stats ebd_grad: 0.0071, clf_grad: 0.0450 21/12/06 16:24:10, ep 9, val acc: 0.2878 ± 0.0499, train stats ebd_grad: 0.0077, clf_grad: 0.0483 21/12/06 16:24:25, ep 10, train acc: 0.2992 ± 0.0481
21/12/06 16:24:31, ep 10, val acc: 0.2896 ± 0.0504, train stats ebd_grad: 0.0082, clf_grad: 0.0559 21/12/06 16:24:31, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/10 21/12/06 16:24:47, ep 11, val acc: 0.2927 ± 0.0554, train stats ebd_grad: 0.0082, clf_grad: 0.0463 21/12/06 16:24:47, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/11 21/12/06 16:25:03, ep 12, val acc: 0.2790 ± 0.0545, train stats ebd_grad: 0.0085, clf_grad: 0.0395 21/12/06 16:25:18, ep 13, val acc: 0.2878 ± 0.0549, train stats ebd_grad: 0.0099, clf_grad: 0.0482 21/12/06 16:25:34, ep 14, val acc: 0.2871 ± 0.0542, train stats ebd_grad: 0.0084, clf_grad: 0.0501 21/12/06 16:25:49, ep 15, val acc: 0.2930 ± 0.0474, train stats ebd_grad: 0.0086, clf_grad: 0.0455 21/12/06 16:25:49, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/15 21/12/06 16:26:05, ep 16, val acc: 0.2906 ± 0.0517, train stats ebd_grad: 0.0099, clf_grad: 0.0479 21/12/06 16:26:21, ep 17, val acc: 0.2937 ± 0.0482, train stats ebd_grad: 0.0107, clf_grad: 0.0442 21/12/06 16:26:21, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/17 21/12/06 16:26:37, ep 18, val acc: 0.2860 ± 0.0575, train stats ebd_grad: 0.0119, clf_grad: 0.0469 21/12/06 16:26:52, ep 19, val acc: 0.2932 ± 0.0595, train stats ebd_grad: 0.0121, clf_grad: 0.0450 21/12/06 16:27:08, ep 20, train acc: 0.2946 ± 0.0601
21/12/06 16:27:14, ep 20, val acc: 0.2879 ± 0.0526, train stats ebd_grad: 0.0149, clf_grad: 0.0487 21/12/06 16:27:30, ep 21, val acc: 0.2897 ± 0.0451, train stats ebd_grad: 0.0184, clf_grad: 0.0544 21/12/06 16:27:45, ep 22, val acc: 0.2950 ± 0.0475, train stats ebd_grad: 0.0141, clf_grad: 0.0555 21/12/06 16:27:45, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/22 21/12/06 16:28:01, ep 23, val acc: 0.2972 ± 0.0550, train stats ebd_grad: 0.0149, clf_grad: 0.0539 21/12/06 16:28:01, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/23 21/12/06 16:28:17, ep 24, val acc: 0.2935 ± 0.0584, train stats ebd_grad: 0.0161, clf_grad: 0.0465 21/12/06 16:28:32, ep 25, val acc: 0.2954 ± 0.0510, train stats ebd_grad: 0.0159, clf_grad: 0.0465 21/12/06 16:28:48, ep 26, val acc: 0.2879 ± 0.0534, train stats ebd_grad: 0.0151, clf_grad: 0.0502 21/12/06 16:29:03, ep 27, val acc: 0.2946 ± 0.0500, train stats ebd_grad: 0.0159, clf_grad: 0.0465 21/12/06 16:29:18, ep 28, val acc: 0.2855 ± 0.0579, train stats ebd_grad: 0.0160, clf_grad: 0.0461 21/12/06 16:29:34, ep 29, val acc: 0.2898 ± 0.0599, train stats ebd_grad: 0.0165, clf_grad: 0.0414 21/12/06 16:29:49, ep 30, train acc: 0.2892 ± 0.0512
21/12/06 16:29:56, ep 30, val acc: 0.2949 ± 0.0515, train stats ebd_grad: 0.0249, clf_grad: 0.0449 21/12/06 16:30:11, ep 31, val acc: 0.2882 ± 0.0502, train stats ebd_grad: 0.0180, clf_grad: 0.0516 21/12/06 16:30:27, ep 32, val acc: 0.2913 ± 0.0488, train stats ebd_grad: 0.0196, clf_grad: 0.0460 21/12/06 16:30:42, ep 33, val acc: 0.2979 ± 0.0492, train stats ebd_grad: 0.0245, clf_grad: 0.0486 21/12/06 16:30:42, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/33 21/12/06 16:30:58, ep 34, val acc: 0.2789 ± 0.0485, train stats ebd_grad: 0.0146, clf_grad: 0.0553 21/12/06 16:31:13, ep 35, val acc: 0.3002 ± 0.0526, train stats ebd_grad: 0.0088, clf_grad: 0.0455 21/12/06 16:31:13, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/35 21/12/06 16:31:29, ep 36, val acc: 0.2946 ± 0.0538, train stats ebd_grad: 0.0094, clf_grad: 0.0482 21/12/06 16:31:44, ep 37, val acc: 0.2930 ± 0.0549, train stats ebd_grad: 0.0118, clf_grad: 0.0456 21/12/06 16:32:00, ep 38, val acc: 0.2967 ± 0.0509, train stats ebd_grad: 0.0139, clf_grad: 0.0416 21/12/06 16:32:15, ep 39, val acc: 0.2962 ± 0.0510, train stats ebd_grad: 0.0200, clf_grad: 0.0465 21/12/06 16:32:31, ep 40, train acc: 0.2914 ± 0.0609
21/12/06 16:32:37, ep 40, val acc: 0.2912 ± 0.0521, train stats ebd_grad: 0.0239, clf_grad: 0.0412 21/12/06 16:32:53, ep 41, val acc: 0.3029 ± 0.0550, train stats ebd_grad: 0.0187, clf_grad: 0.0477 21/12/06 16:32:53, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/41 21/12/06 16:33:09, ep 42, val acc: 0.2994 ± 0.0490, train stats ebd_grad: 0.0236, clf_grad: 0.0442 21/12/06 16:33:24, ep 43, val acc: 0.2999 ± 0.0510, train stats ebd_grad: 0.0190, clf_grad: 0.0507 21/12/06 16:33:40, ep 44, val acc: 0.2966 ± 0.0599, train stats ebd_grad: 0.0204, clf_grad: 0.0493 21/12/06 16:33:55, ep 45, val acc: 0.2898 ± 0.0474, train stats ebd_grad: 0.0155, clf_grad: 0.0495 21/12/06 16:34:11, ep 46, val acc: 0.2948 ± 0.0571, train stats ebd_grad: 0.0181, clf_grad: 0.0514 21/12/06 16:34:26, ep 47, val acc: 0.2921 ± 0.0638, train stats ebd_grad: 0.0215, clf_grad: 0.0426 21/12/06 16:34:42, ep 48, val acc: 0.2938 ± 0.0530, train stats ebd_grad: 0.0192, clf_grad: 0.0430 21/12/06 16:34:58, ep 49, val acc: 0.3007 ± 0.0572, train stats ebd_grad: 0.0229, clf_grad: 0.0470 21/12/06 16:35:13, ep 50, train acc: 0.2861 ± 0.0514
21/12/06 16:35:20, ep 50, val acc: 0.2920 ± 0.0495, train stats ebd_grad: 0.0177, clf_grad: 0.0468 21/12/06 16:35:35, ep 51, val acc: 0.2970 ± 0.0463, train stats ebd_grad: 0.0219, clf_grad: 0.0493 21/12/06 16:35:51, ep 52, val acc: 0.2833 ± 0.0492, train stats ebd_grad: 0.0216, clf_grad: 0.0396 21/12/06 16:36:07, ep 53, val acc: 0.3030 ± 0.0576, train stats ebd_grad: 0.0216, clf_grad: 0.0421 21/12/06 16:36:07, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/53 21/12/06 16:36:22, ep 54, val acc: 0.2985 ± 0.0553, train stats ebd_grad: 0.0198, clf_grad: 0.0477 21/12/06 16:36:38, ep 55, val acc: 0.2883 ± 0.0554, train stats ebd_grad: 0.0188, clf_grad: 0.0433 21/12/06 16:36:53, ep 56, val acc: 0.2958 ± 0.0541, train stats ebd_grad: 0.0196, clf_grad: 0.0400 21/12/06 16:37:09, ep 57, val acc: 0.2900 ± 0.0518, train stats ebd_grad: 0.0190, clf_grad: 0.0433 21/12/06 16:37:24, ep 58, val acc: 0.2883 ± 0.0502, train stats ebd_grad: 0.0194, clf_grad: 0.0467 21/12/06 16:37:40, ep 59, val acc: 0.2977 ± 0.0532, train stats ebd_grad: 0.0196, clf_grad: 0.0478 21/12/06 16:37:55, ep 60, train acc: 0.2925 ± 0.0572
21/12/06 16:38:02, ep 60, val acc: 0.2874 ± 0.0563, train stats ebd_grad: 0.0156, clf_grad: 0.0451 21/12/06 16:38:18, ep 61, val acc: 0.2959 ± 0.0589, train stats ebd_grad: 0.0163, clf_grad: 0.0529 21/12/06 16:38:33, ep 62, val acc: 0.2920 ± 0.0550, train stats ebd_grad: 0.0162, clf_grad: 0.0459 21/12/06 16:38:49, ep 63, val acc: 0.2935 ± 0.0476, train stats ebd_grad: 0.0203, clf_grad: 0.0450 21/12/06 16:39:05, ep 64, val acc: 0.2976 ± 0.0522, train stats ebd_grad: 0.0215, clf_grad: 0.0490 21/12/06 16:39:20, ep 65, val acc: 0.2998 ± 0.0549, train stats ebd_grad: 0.0195, clf_grad: 0.0476 21/12/06 16:39:35, ep 66, val acc: 0.2983 ± 0.0541, train stats ebd_grad: 0.0188, clf_grad: 0.0496 21/12/06 16:39:51, ep 67, val acc: 0.2908 ± 0.0585, train stats ebd_grad: 0.0206, clf_grad: 0.0483 21/12/06 16:40:07, ep 68, val acc: 0.2988 ± 0.0587, train stats ebd_grad: 0.0231, clf_grad: 0.0534 21/12/06 16:40:22, ep 69, val acc: 0.2901 ± 0.0530, train stats ebd_grad: 0.0213, clf_grad: 0.0472 21/12/06 16:40:38, ep 70, train acc: 0.2870 ± 0.0544
21/12/06 16:40:44, ep 70, val acc: 0.3046 ± 0.0511, train stats ebd_grad: 0.0186, clf_grad: 0.0582 21/12/06 16:40:44, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/tmp-runs/16388256862795444/70 21/12/06 16:41:00, ep 71, val acc: 0.2986 ± 0.0519, train stats ebd_grad: 0.0222, clf_grad: 0.0535 21/12/06 16:41:15, ep 72, val acc: 0.2951 ± 0.0546, train stats ebd_grad: 0.0226, clf_grad: 0.0469 21/12/06 16:41:31, ep 73, val acc: 0.2935 ± 0.0485, train stats ebd_grad: 0.0203, clf_grad: 0.0517 21/12/06 16:41:46, ep 74, val acc: 0.3030 ± 0.0565, train stats ebd_grad: 0.0254, clf_grad: 0.0506 21/12/06 16:42:02, ep 75, val acc: 0.2969 ± 0.0499, train stats ebd_grad: 0.0204, clf_grad: 0.0421 21/12/06 16:42:17, ep 76, val acc: 0.2976 ± 0.0538, train stats ebd_grad: 0.0190, clf_grad: 0.0493 21/12/06 16:42:33, ep 77, val acc: 0.3005 ± 0.0593, train stats ebd_grad: 0.0199, clf_grad: 0.0497 21/12/06 16:42:48, ep 78, val acc: 0.2904 ± 0.0508, train stats ebd_grad: 0.0205, clf_grad: 0.0426 21/12/06 16:43:04, ep 79, val acc: 0.2938 ± 0.0526, train stats ebd_grad: 0.0152, clf_grad: 0.0496 21/12/06 16:43:19, ep 80, train acc: 0.2915 ± 0.0497
21/12/06 16:43:26, ep 80, val acc: 0.2945 ± 0.0579, train stats ebd_grad: 0.0193, clf_grad: 0.0538 21/12/06 16:43:41, ep 81, val acc: 0.2920 ± 0.0573, train stats ebd_grad: 0.0232, clf_grad: 0.0442 21/12/06 16:43:57, ep 82, val acc: 0.2979 ± 0.0546, train stats ebd_grad: 0.0202, clf_grad: 0.0443 21/12/06 16:44:12, ep 83, val acc: 0.2946 ± 0.0620, train stats ebd_grad: 0.0228, clf_grad: 0.0462 21/12/06 16:44:28, ep 84, val acc: 0.3012 ± 0.0559, train stats ebd_grad: 0.0173, clf_grad: 0.0515 21/12/06 16:44:43, ep 85, val acc: 0.2887 ± 0.0570, train stats ebd_grad: 0.0192, clf_grad: 0.0443 21/12/06 16:44:58, ep 86, val acc: 0.2920 ± 0.0598, train stats ebd_grad: 0.0201, clf_grad: 0.0427 21/12/06 16:45:14, ep 87, val acc: 0.2982 ± 0.0524, train stats ebd_grad: 0.0191, clf_grad: 0.0463 21/12/06 16:45:29, ep 88, val acc: 0.2974 ± 0.0526, train stats ebd_grad: 0.0203, clf_grad: 0.0454 21/12/06 16:45:45, ep 89, val acc: 0.3018 ± 0.0627, train stats ebd_grad: 0.0219, clf_grad: 0.0490 21/12/06 16:46:00, ep 90, train acc: 0.2902 ± 0.0586
21/12/06 16:46:07, ep 90, val acc: 0.2978 ± 0.0579, train stats ebd_grad: 0.0276, clf_grad: 0.0521 21/12/06 16:46:07, End of training. Restore the best weights 21/12/06 16:46:14, acc mean 0.2870, std 0.0549
21/12/06 16:47:23, acc mean 0.2828, std 0.0507
`

For your FewRel exp with Bert, you need to specify —pos_max_len 300 to the command. This argument is used to initialize the positional embeddings in src/embedding/auxiliary/pos.py.

The previous failure was due to the fact that your positional embedding didn’t cover enough position options for the input.

Hi, thanks a lot for your help. I have run FewRel with your help by adding --pos_max_len 300 without error reports. However, the results from FewRel by Bert are also highly different from your Tab. 2. I believe it should be the issue on something except codes and commands, such as the Conda configurations? If you can provide a file including your Conda configurations by conda env export > signature.yaml, it will be greatly helpful for us, who are interested in your work.

I also attach the results from FewRel by Bert to the end. ` main.py --cuda 0 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset fewrel --data_path data/fewrel_bert_uncase.json --n_train_class 65 --n_val_class 5 --n_test_class 10 --meta_iwf --meta_w_target --auxiliary pos --bert --pretrained_bert bert-base-uncased --pos_max_len 100

Parameters: AUXILIARY=['pos'] BERT=True BERT_CACHE_DIR=None CLASSIFIER=r2d2 CLIP_GRAD=None CUDA=0 DATA_PATH=data/fewrel_bert_uncase.json DATASET=fewrel DONE_QUEUE_LIMIT=100 DROPOUT=0.1 EMBEDDING=meta FINETUNE_EBD=False FINETUNE_EPISODES=10 FINETUNE_LOSS_TYPE=softmax FINETUNE_MAXEPOCHS=5000 FINETUNE_SPLIT=0.8 INDUCT_ATT_DIM=64 INDUCT_HIDDEN_DIM=100 INDUCT_ITER=3 INDUCT_RNN_DIM=128 LR=0.001 LRD2_NUM_ITERS=5 MAML=False META_EBD=False META_IDF=False META_IWF=True META_TARGET_ENTROPY=False META_W_TARGET=True META_W_TARGET_LAM=1 MODE=train N_TEST_CLASS=10 N_TRAIN_CLASS=65 N_VAL_CLASS=5 N_WORKERS=10 NOTQDM=False PATIENCE=20 POS_EBD_DIM=5 POS_MAX_LEN=100 PRETRAINED_BERT=bert-base-uncased QUERY=25 RESULT_PATH= SAVE=False SEED=330 SHOT=1 SLEEP_TIME=1 SNAPSHOT= TEST_EPISODES=1000 TRAIN_EPISODES=100 TRAIN_EPOCHS=1000 USE_DYNAMIC_CLASSIFIER=False VAL_EPISODES=100 WAY=5 WORD_VECTOR=wiki.en.vec WV_PATH=./

(Credit: Maija Haavisto)                        /
                             _,.------....___,.' ',.-.
                          ,-'          _,.--'        |
                        ,'         _.-'              .
                       /   ,     ,'                   `
                      .   /     /                     ``.
                      |  |     .                       \.\
            ____      |___._.  |       __               \ `.
          .'    `---''       ``'-.--''`  \               .  \
         .  ,            __               `              |   .
         `,'         ,-''  .               \             |    L
        ,'          '    _.'                -._          /    |
       ,`-.    ,'.   `--'                      >.      ,'     |
      . .'\'   `-'       __    ,  ,-.         /  `.__.-      ,'
      ||:, .           ,'  ;  /  / \ `        `.    .      .'/
      j|:D  \          `--'  ' ,'_  . .         `.__, \   , /
     / L:_  |                 .  '' :_;                `.'.'
     .    '''                  ''''''                    V
      `.                                 .    `.   _,..  `
        `,_   .    .                _,-'/    .. `,'   __  `
         ) \`._        ___....----''  ,'   .'  \ |   '  \  .
        /   `. '`-.--''         _,' ,'     `---' |    `./  |
       .   _  `'''--.._____..--'   ,             '         |
       | .' `. `-.                /-.           /          ,
       | `._.'    `,_            ;  /         ,'          .
      .'          /| `-.        . ,'         ,           ,
      '-.__ __ _,','    '`-..___;-...__   ,.'\ ____.___.'
      `'^--'..'   '-`-^-''--    `-^-'`.'''''''`.,^.`.--' mh

21/12/06 17:17:13: Loading data 21/12/06 17:17:13: Class balance: {41: 700, 42: 700, 43: 700, 25: 700, 26: 700, 27: 700, 76: 700, 77: 700, 44: 700, 45: 700, 46: 700, 50: 700, 51: 700, 56: 700, 57: 700, 61: 700, 62: 700, 63: 700, 10: 700, 11: 700, 30: 700, 31: 700, 14: 700, 15: 700, 67: 700, 68: 700, 69: 700, 20: 700, 21: 700, 36: 700, 37: 700, 4: 700, 5: 700, 54: 700, 55: 700, 60: 700, 8: 700, 9: 700, 28: 700, 29: 700, 16: 700, 17: 700, 66: 700, 22: 700, 23: 700, 34: 700, 35: 700, 2: 700, 3: 700, 40: 700, 74: 700, 75: 700, 48: 700, 49: 700, 58: 700, 59: 700, 6: 700, 7: 700, 32: 700, 33: 700, 18: 700, 19: 700, 64: 700, 24: 700, 38: 700, 0: 700, 1: 700, 72: 700, 73: 700, 47: 700, 70: 700, 78: 700, 79: 700, 12: 700, 52: 700, 53: 700, 71: 700, 13: 700, 65: 700, 39: 700} 21/12/06 17:17:13: Avg len: 28.964017857142856 21/12/06 17:17:13: Loading word vectors 21/12/06 17:17:15: Total num. of words: 17835, word vector dimension: 300 21/12/06 17:17:15: Num. of out-of-vocabulary words(they are initialized to zeros): 4423 21/12/06 17:17:15: #train 45500, #val 3500, #test 7000 21/12/06 17:17:18: Convert everything into np array for fast data loading finished 21/12/06 17:17:19: precompute_stats finished 21/12/06 17:17:19: start meta_w_target 21/12/06 17:17:19, Loading pretrained bert Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 21/12/06 17:17:20: Cal embedding finished 21/12/06 17:17:20: Cal AVG finished 21/12/06 17:17:21: finish meta_w_target load data finished 21/12/06 17:17:21, Building embedding 21/12/06 17:17:21, Loading pretrained bert Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']
This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). 21/12/06 17:17:22, Building augmented embedding 21/12/06 17:17:22, Building embedding 21/12/06 17:17:22: Building classifier loda model finished 21/12/06 17:17:22, Start training 21/12/06 17:18:06, ep 0, train acc: 0.5622 ± 0.0879 21/12/06 17:18:25, ep 0, val acc: 0.4567 ± 0.0673, train stats ebd_grad: 0.0195, clf_grad: 0.2272 21/12/06 17:18:25, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/0 21/12/06 17:19:09, ep 1, val acc: 0.4772 ± 0.0745, train stats ebd_grad: 0.0453, clf_grad: 0.2933 21/12/06 17:19:09, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/1 21/12/06 17:19:51, ep 2, val acc: 0.4470 ± 0.0781, train stats ebd_grad: 0.2022, clf_grad: 0.4022 21/12/06 17:20:33, ep 3, val acc: 0.4478 ± 0.0727, train stats ebd_grad: 0.5143, clf_grad: 0.4611 21/12/06 17:21:16, ep 4, val acc: 0.4605 ± 0.0798, train stats ebd_grad: 0.6733, clf_grad: 0.3826 21/12/06 17:21:58, ep 5, val acc: 0.4809 ± 0.0717, train stats ebd_grad: 0.5033, clf_grad: 0.3340 21/12/06 17:21:58, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/5 21/12/06 17:22:41, ep 6, val acc: 0.4866 ± 0.0718, train stats ebd_grad: 0.4832, clf_grad: 0.3817 21/12/06 17:22:41, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/6 21/12/06 17:23:24, ep 7, val acc: 0.5011 ± 0.0747, train stats ebd_grad: 0.5141, clf_grad: 0.3598 21/12/06 17:23:24, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/7 21/12/06 17:24:08, ep 8, val acc: 0.5059 ± 0.0728, train stats ebd_grad: 0.4551, clf_grad: 0.3739 21/12/06 17:24:08, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/8 21/12/06 17:24:51, ep 9, val acc: 0.5198 ± 0.0777, train stats ebd_grad: 0.3576, clf_grad: 0.3112 21/12/06 17:24:51, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/9 21/12/06 17:25:35, ep 10, train acc: 0.5723 ± 0.0993 21/12/06 17:25:56, ep 10, val acc: 0.4956 ± 0.0796, train stats ebd_grad: 0.3413, clf_grad: 0.3042 21/12/06 17:26:37, ep 11, val acc: 0.5019 ± 0.0701, train stats ebd_grad: 0.3803, clf_grad: 0.2987 21/12/06 17:27:20, ep 12, val acc: 0.5222 ± 0.0760, train stats ebd_grad: 0.4137, clf_grad: 0.3705 21/12/06 17:27:20, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/12 21/12/06 17:28:03, ep 13, val acc: 0.5133 ± 0.0763, train stats ebd_grad: 0.4295, clf_grad: 0.3305 21/12/06 17:28:46, ep 14, val acc: 0.5004 ± 0.0823, train stats ebd_grad: 0.3991, clf_grad: 0.2927 21/12/06 17:29:28, ep 15, val acc: 0.5092 ± 0.0766, train stats ebd_grad: 0.3871, clf_grad: 0.4081 21/12/06 17:30:11, ep 16, val acc: 0.5238 ± 0.0693, train stats ebd_grad: 0.3746, clf_grad: 0.3586 21/12/06 17:30:11, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/16 21/12/06 17:30:54, ep 17, val acc: 0.5187 ± 0.0657, train stats ebd_grad: 0.4255, clf_grad: 0.3685 21/12/06 17:31:38, ep 18, val acc: 0.5193 ± 0.0788, train stats ebd_grad: 0.3811, clf_grad: 0.3981 21/12/06 17:32:20, ep 19, val acc: 0.5129 ± 0.0701, train stats ebd_grad: 0.3827, clf_grad: 0.3884 21/12/06 17:33:04, ep 20, train acc: 0.5853 ± 0.1092 21/12/06 17:33:24, ep 20, val acc: 0.5061 ± 0.0688, train stats ebd_grad: 0.3833, clf_grad: 0.4260 21/12/06 17:34:07, ep 21, val acc: 0.5070 ± 0.0877, train stats ebd_grad: 0.3545, clf_grad: 0.4513 21/12/06 17:34:50, ep 22, val acc: 0.5046 ± 0.0689, train stats ebd_grad: 0.3791, clf_grad: 0.3865 21/12/06 17:35:33, ep 23, val acc: 0.5324 ± 0.0741, train stats ebd_grad: 0.3473, clf_grad: 0.3786 21/12/06 17:35:33, Save cur best model to /home/jfhe/Documents/MountHe/jfhe/FewShotDocUncertainty/few_shot_doc_classification_uncertainty/src/tmp-runs/16388290423750308/23 21/12/06 17:36:17, ep 24, val acc: 0.5169 ± 0.0742, train stats ebd_grad: 0.3190, clf_grad: 0.3433 21/12/06 17:37:00, ep 25, val acc: 0.5126 ± 0.0815, train stats ebd_grad: 0.3803, clf_grad: 0.3650 21/12/06 17:37:44, ep 26, val acc: 0.5166 ± 0.0794, train stats ebd_grad: 0.3470, clf_grad: 0.3280 21/12/06 17:38:26, ep 27, val acc: 0.5109 ± 0.0729, train stats ebd_grad: 0.3670, clf_grad: 0.3938 21/12/06 17:39:09, ep 28, val acc: 0.5260 ± 0.0725, train stats ebd_grad: 0.3554, clf_grad: 0.3327 21/12/06 17:39:52, ep 29, val acc: 0.5238 ± 0.0812, train stats ebd_grad: 0.3481, clf_grad: 0.3472 21/12/06 17:40:35, ep 30, train acc: 0.6282 ± 0.0729 21/12/06 17:40:55, ep 30, val acc: 0.5117 ± 0.0786, train stats ebd_grad: 0.3760, clf_grad: 0.3615 21/12/06 17:41:37, ep 31, val acc: 0.5194 ± 0.0699, train stats ebd_grad: 0.3855, clf_grad: 0.3404 21/12/06 17:42:19, ep 32, val acc: 0.5114 ± 0.0748, train stats ebd_grad: 0.4466, clf_grad: 0.3765 21/12/06 17:43:02, ep 33, val acc: 0.5219 ± 0.0700, train stats ebd_grad: 0.4148, clf_grad: 0.4098 21/12/06 17:43:44, ep 34, val acc: 0.5162 ± 0.0845, train stats ebd_grad: 0.4044, clf_grad: 0.3597 21/12/06 17:44:27, ep 35, val acc: 0.5160 ± 0.0763, train stats ebd_grad: 0.3564, clf_grad: 0.2881 21/12/06 17:45:10, ep 36, val acc: 0.5231 ± 0.0745, train stats ebd_grad: 0.3703, clf_grad: 0.2835 21/12/06 17:45:54, ep 37, val acc: 0.5204 ± 0.0800, train stats ebd_grad: 0.3630, clf_grad: 0.3807 21/12/06 17:46:37, ep 38, val acc: 0.5177 ± 0.0703, train stats ebd_grad: 0.4441, clf_grad: 0.3831 21/12/06 17:47:21, ep 39, val acc: 0.5294 ± 0.0847, train stats ebd_grad: 0.3706, clf_grad: 0.3904 21/12/06 17:48:05, ep 40, train acc: 0.5884 ± 0.0913 21/12/06 17:48:25, ep 40, val acc: 0.5058 ± 0.0721, train stats ebd_grad: 0.4298, clf_grad: 0.3771 21/12/06 17:49:08, ep 41, val acc: 0.5237 ± 0.0676, train stats ebd_grad: 0.3531, clf_grad: 0.4066 21/12/06 17:49:51, ep 42, val acc: 0.5034 ± 0.0734, train stats ebd_grad: 0.3367, clf_grad: 0.3374 21/12/06 17:50:35, ep 43, val acc: 0.5254 ± 0.0773, train stats ebd_grad: 0.3968, clf_grad: 0.3561 21/12/06 17:50:35, End of training. Restore the best weights 21/12/06 17:50:55, acc mean 0.5201, std 0.0658 21/12/06 17:54:27, acc mean 0.6032, std 0.1057 `

Yes, this result definitely does not make sense (cuzs it is even lower than the non-BERT performance).

We used IBM cloud to run all our experiments three years ago. We could not export the environment information at that time (it was not available to us), but we do know that it was running PyTorch 1.0 and pytorch-transformers 1.1.0. It is a bit difficult to reinstall the exact same environment right now (we just cannot run the old PyTorch version on our current machines). We will consider to refactor the code base for the new APIs in the future.

Thanks a lot for your quick reply and I understand the situation. I will try the versions of your Pytorch and PyTorch-transformer, to see whether I can get normal results for BERT. Thanks a lot for your help.

I have run your command on Huffpost and the results seem normal to me (I have attached the 5 shot exp below). What is your PyTorch version and Transformer version?

Hi, I've also encountered the same issue as @he159ok in Huffpost dataset with best 1-shot performance around 0.30 only. Could you please share your pytorch and transformers version? Thanks a lot!

he159ok and I had an email discussion and we thought the reason comes from that, the current tokenization of BERT released from HuggioFace is different. Thus the previous tokenizations in "huffpost_bert_uncase.json" are not applicable to the current BERT model.

YujiaBao / Distributional-Signatures

Bert running command #32

Name Version Build Channel