Closed he159ok closed 2 years ago
What do you mean by abnormal results? I modified your command by changing main.py
to src/main.py
and managed to run it successfully at the root position of the repo.
Hi, for your fastText embedding, I got similar results in your Tab. 1. However, I run the Bert results on the HuffPost dataset, either 5way1shot or 5way5shot are highly different from your Tab. 2.
For 5way1shot, I use the command
python main.py --bert --pretrained_bert bert-base-uncased --cuda 2 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target
The results are below,
21/12/06 11:39:32, acc mean 0.2830, std 0.0528 21/12/06 11:40:22, acc mean 0.2808, std 0.0513
I also tried
--way 5 --shot 1
where all other parameters are the same as the above command.
The results are below,
21/12/06 10:48:35, acc mean 0.2899, std 0.0545 21/12/06 10:49:24, acc mean 0.2814, std 0.0517
I am now checking my code and it will be greatly appreciated if you can share a Bert command on HuffPost.
Also, when I run the Bert for FewRel by the command
python main.py --bert --pretrained_bert bert-base-uncased --cuda 0 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset fewrel --data_path data/fewrel_bert_uncase.json --n_train_class 65 --n_val_class 5 --n_test_class 10 --auxiliary pos --meta_iwf --meta_w_target
Below bug was met,
File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 141, in forward scale = self.compute_score(data, ebd) File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 209, in compute_score hidden = self.rnn(x, data['text_len']) File "/home/jfhe/anaconda3/envs/fewdoc2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 83, in forward sort_text, sort_len, invert_order, num_zero = self._sort_tensor(input=text, lengths=text_len) File "/home/jfhe/Documents/MountHe/jfhe/projects/Distributional-Signatures/src/embedding/meta.py", line 39, in _sort_tensor nonzero_idx = sorted_lengths.nonzero() RuntimeError: CUDA error: device-side assert triggered Killed
If you can provide any hint about the command mistakes or something else, great appreciation!
I have run your command on Huffpost and the results seem normal to me (I have attached the 5 shot exp below). What is your PyTorch version and Transformer version?
(signature) ➜ Distributional-Signatures git:(master) ✗ python src/main.py --bert --pretrained_bert bert-base-uncased --cuda 6 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target --seed=330
Parameters:
AUXILIARY=[]
BERT=True
BERT_CACHE_DIR=None
CLASSIFIER=r2d2
CLIP_GRAD=None
CUDA=6
DATA_PATH=data/huffpost_bert_uncase.json
DATASET=huffpost
DROPOUT=0.1
EMBEDDING=meta
FINETUNE_EBD=False
FINETUNE_EPISODES=10
FINETUNE_LOSS_TYPE=softmax
FINETUNE_MAXEPOCHS=5000
FINETUNE_SPLIT=0.8
INDUCT_ATT_DIM=64
INDUCT_HIDDEN_DIM=100
INDUCT_ITER=3
INDUCT_RNN_DIM=128
LR=0.001
LRD2_NUM_ITERS=5
MAML=False
META_EBD=False
META_IDF=False
META_IWF=True
META_TARGET_ENTROPY=False
META_W_TARGET=True
META_W_TARGET_LAM=1
MODE=train
N_TEST_CLASS=16
N_TRAIN_CLASS=20
N_VAL_CLASS=5
N_WORKERS=10
NOTQDM=False
PATIENCE=20
PRETRAINED_BERT=bert-base-uncased
QUERY=25
RESULT_PATH=
SAVE=False
SEED=330
SHOT=1
SNAPSHOT=
TEST_EPISODES=1000
TRAIN_EPISODES=100
TRAIN_EPOCHS=1000
VAL_EPISODES=100
WAY=5
WORD_VECTOR=wiki.en.vec
WV_PATH=./
(Credit: Maija Haavisto) /
_,.------....___,.' ',.-.
,-' _,.--' |
,' _.-' .
/ , ,' `
. / / ``.
| | . \.\
____ |___._. | __ \ `.
.' `---'' ``'-.--''` \ . \
. , __ ` | .
`,' ,-'' . \ | L
,' ' _.' -._ / |
,`-. ,'. `--' >. ,' |
. .'\' `-' __ , ,-. / `.__.- ,'
||:, . ,' ; / / \ ` `. . .'/
j|:D \ `--' ' ,'_ . . `.__, \ , /
/ L:_ | . '' :_; `.'.'
. ''' '''''' V
`. . `. _,.. `
`,_ . . _,-'/ .. `,' __ `
) \`._ ___....----'' ,' .' \ | ' \ .
/ `. '`-.--'' _,' ,' `---' | `./ |
. _ `'''--.._____..--' , ' |
| .' `. `-. /-. / ,
| `._.' `,_ ; / ,' .
.' /| `-. . ,' , ,
'-.__ __ _,',' '`-..___;-...__ ,.'\ ____.___.'
`'^--'..' '-`-^-''-- `-^-'`.'''''''`.,^.`.--' mh
21/12/06 14:54:59: Loading data
21/12/06 14:54:59: Class balance:
{19: 900, 4: 900, 5: 900, 8: 900, 1: 900, 13: 900, 31: 900, 16: 900, 36: 900, 39: 900, 14: 900, 11: 900, 23: 900, 17: 900, 7: 900, 21: 900, 26: 900, 12: 900, 18: 900, 37: 90
0, 6: 900, 22: 900, 40: 900, 15: 900, 29: 900, 10: 900, 35: 900, 38: 900, 9: 900, 25: 900, 30: 900, 20: 900, 3: 900, 27: 900, 24: 900, 34: 900, 33: 900, 32: 900, 0: 900, 2:
900, 28: 900}
21/12/06 14:54:59: Avg len: 13.077235772357724
21/12/06 14:54:59: Loading word vectors
21/12/06 14:55:00: Total num. of words: 9376, word vector dimension: 300
21/12/06 14:55:00: Num. of out-of-vocabulary words(they are initialized to zeros): 2496
21/12/06 14:55:00: #train 18000, #val 4500, #test 14400
load complete
pre compute complete
21/12/06 14:55:18, Loading pretrained bert
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cl
s.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.b
ias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequence
Classification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification
model from a BertForSequenceClassification model).
wordebd complete
avg ebd complete
cuda complete
meta w target complete
21/12/06 14:55:37, Building embedding
21/12/06 14:55:37, Loading pretrained bert
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cl
s.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.b
ias', 'cls.predictions.transform.dense.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequence
Classification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification
model from a BertForSequenceClassification model).
21/12/06 14:55:39, Building augmented embedding
21/12/06 14:55:39, Building embedding
21/12/06 14:55:41: Building classifier
21/12/06 14:55:41, Start training
21/12/06 14:56:23, ep 0, train acc: 0.3415 ± 0.0610
21/12/06 14:56:43, ep 0, val acc: 0.3478 ± 0.0662, train stats ebd_grad: 0.0042, clf_grad: 0.0808
21/12/06 14:56:43, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/0
21/12/06 14:57:34, ep 1, val acc: 0.3582 ± 0.0721, train stats ebd_grad: 0.0176, clf_grad: 0.0909
21/12/06 14:57:34, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/1
21/12/06 14:58:22, ep 2, val acc: 0.3620 ± 0.0786, train stats ebd_grad: 0.0445, clf_grad: 0.1232
21/12/06 14:58:22, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/2
21/12/06 14:59:12, ep 3, val acc: 0.3722 ± 0.0746, train stats ebd_grad: 0.0903, clf_grad: 0.1127
21/12/06 14:59:12, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/3
21/12/06 15:00:01, ep 4, val acc: 0.3702 ± 0.0729, train stats ebd_grad: 0.1075, clf_grad: 0.1257
21/12/06 15:00:48, ep 5, val acc: 0.3848 ± 0.0773, train stats ebd_grad: 0.1343, clf_grad: 0.1374
21/12/06 15:00:48, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/5
21/12/06 15:01:36, ep 6, val acc: 0.3816 ± 0.0714, train stats ebd_grad: 0.0972, clf_grad: 0.1173
21/12/06 15:02:22, ep 7, val acc: 0.3826 ± 0.0676, train stats ebd_grad: 0.0860, clf_grad: 0.1277
21/12/06 15:02:59, ep 8, val acc: 0.3770 ± 0.0696, train stats ebd_grad: 0.0896, clf_grad: 0.1577
21/12/06 15:03:35, ep 9, val acc: 0.3881 ± 0.0728, train stats ebd_grad: 0.1019, clf_grad: 0.1478
21/12/06 15:03:35, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/9
21/12/06 15:04:16, ep 10, train acc: 0.3588 ± 0.0736
21/12/06 15:04:36, ep 10, val acc: 0.3789 ± 0.0705, train stats ebd_grad: 0.0970, clf_grad: 0.1476
21/12/06 15:05:18, ep 11, val acc: 0.3936 ± 0.0734, train stats ebd_grad: 0.0925, clf_grad: 0.1372
21/12/06 15:05:18, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/11
21/12/06 15:06:07, ep 12, val acc: 0.3806 ± 0.0716, train stats ebd_grad: 0.0954, clf_grad: 0.1458
21/12/06 15:06:52, ep 13, val acc: 0.3746 ± 0.0720, train stats ebd_grad: 0.1013, clf_grad: 0.1549
21/12/06 15:07:36, ep 14, val acc: 0.3960 ± 0.0682, train stats ebd_grad: 0.0992, clf_grad: 0.1657
21/12/06 15:07:36, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/14
21/12/06 15:08:24, ep 15, val acc: 0.3891 ± 0.0696, train stats ebd_grad: 0.0996, clf_grad: 0.1708
21/12/06 15:09:09, ep 16, val acc: 0.3871 ± 0.0660, train stats ebd_grad: 0.0936, clf_grad: 0.1548
21/12/06 15:09:56, ep 17, val acc: 0.3785 ± 0.0699, train stats ebd_grad: 0.1122, clf_grad: 0.1609
21/12/06 15:10:40, ep 18, val acc: 0.3890 ± 0.0723, train stats ebd_grad: 0.1035, clf_grad: 0.1297
21/12/06 15:11:21, ep 19, val acc: 0.3794 ± 0.0626, train stats ebd_grad: 0.1239, clf_grad: 0.1440
21/12/06 15:11:58, ep 20, train acc: 0.3702 ± 0.0713
21/12/06 15:12:16, ep 20, val acc: 0.3966 ± 0.0712, train stats ebd_grad: 0.1064, clf_grad: 0.1563
21/12/06 15:12:16, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/20
21/12/06 15:12:56, ep 21, val acc: 0.3905 ± 0.0776, train stats ebd_grad: 0.1189, clf_grad: 0.1409
21/12/06 15:13:35, ep 22, val acc: 0.3920 ± 0.0750, train stats ebd_grad: 0.1326, clf_grad: 0.1669
21/12/06 15:14:12, ep 23, val acc: 0.3753 ± 0.0579, train stats ebd_grad: 0.1180, clf_grad: 0.1364
21/12/06 15:14:49, ep 24, val acc: 0.3894 ± 0.0674, train stats ebd_grad: 0.1167, clf_grad: 0.1416
21/12/06 15:15:27, ep 25, val acc: 0.3922 ± 0.0657, train stats ebd_grad: 0.1340, clf_grad: 0.1449
21/12/06 15:16:09, ep 26, val acc: 0.3902 ± 0.0684, train stats ebd_grad: 0.1159, clf_grad: 0.1463
21/12/06 15:16:54, ep 27, val acc: 0.3910 ± 0.0656, train stats ebd_grad: 0.1312, clf_grad: 0.1441
21/12/06 15:17:38, ep 28, val acc: 0.4038 ± 0.0662, train stats ebd_grad: 0.1245, clf_grad: 0.1548
21/12/06 15:17:38, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/28
21/12/06 15:18:25, ep 29, val acc: 0.3814 ± 0.0651, train stats ebd_grad: 0.1462, clf_grad: 0.1355
21/12/06 15:19:11, ep 30, train acc: 0.3717 ± 0.0698
21/12/06 15:19:32, ep 30, val acc: 0.3909 ± 0.0702, train stats ebd_grad: 0.1387, clf_grad: 0.1573
21/12/06 15:20:17, ep 31, val acc: 0.3894 ± 0.0743, train stats ebd_grad: 0.1211, clf_grad: 0.1370
21/12/06 15:21:02, ep 32, val acc: 0.3869 ± 0.0660, train stats ebd_grad: 0.1346, clf_grad: 0.1571
21/12/06 15:21:47, ep 33, val acc: 0.3946 ± 0.0669, train stats ebd_grad: 0.1273, clf_grad: 0.1450
21/12/06 15:22:24, ep 34, val acc: 0.4006 ± 0.0780, train stats ebd_grad: 0.1420, clf_grad: 0.1509
21/12/06 15:23:06, ep 35, val acc: 0.3990 ± 0.0664, train stats ebd_grad: 0.1579, clf_grad: 0.1534
21/12/06 15:23:49, ep 36, val acc: 0.3943 ± 0.0688, train stats ebd_grad: 0.1749, clf_grad: 0.1552
21/12/06 15:24:31, ep 37, val acc: 0.3918 ± 0.0709, train stats ebd_grad: 0.1825, clf_grad: 0.1556
21/12/06 15:25:14, ep 38, val acc: 0.3995 ± 0.0720, train stats ebd_grad: 0.1879, clf_grad: 0.1406
21/12/06 15:25:58, ep 39, val acc: 0.4014 ± 0.0723, train stats ebd_grad: 0.1808, clf_grad: 0.1652
21/12/06 15:26:40, ep 40, train acc: 0.3672 ± 0.0711
21/12/06 15:27:02, ep 40, val acc: 0.3967 ± 0.0693, train stats ebd_grad: 0.1665, clf_grad: 0.1443
21/12/06 15:27:50, ep 41, val acc: 0.4033 ± 0.0722, train stats ebd_grad: 0.1685, clf_grad: 0.1637
21/12/06 15:28:38, ep 42, val acc: 0.4010 ± 0.0745, train stats ebd_grad: 0.1529, clf_grad: 0.1562
21/12/06 15:29:23, ep 43, val acc: 0.4131 ± 0.0660, train stats ebd_grad: 0.1764, clf_grad: 0.1668
21/12/06 15:29:23, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/43
21/12/06 15:30:18, ep 44, val acc: 0.3986 ± 0.0675, train stats ebd_grad: 0.1770, clf_grad: 0.1645
21/12/06 15:31:04, ep 45, val acc: 0.3989 ± 0.0664, train stats ebd_grad: 0.1586, clf_grad: 0.1817
21/12/06 15:31:45, ep 46, val acc: 0.3942 ± 0.0632, train stats ebd_grad: 0.1796, clf_grad: 0.1733
21/12/06 15:32:27, ep 47, val acc: 0.3974 ± 0.0763, train stats ebd_grad: 0.1576, clf_grad: 0.1586
21/12/06 15:33:04, ep 48, val acc: 0.4040 ± 0.0753, train stats ebd_grad: 0.2087, clf_grad: 0.1733
21/12/06 15:33:40, ep 49, val acc: 0.4090 ± 0.0699, train stats ebd_grad: 0.1966, clf_grad: 0.1630
21/12/06 15:34:16, ep 50, train acc: 0.3844 ± 0.0695
21/12/06 15:34:36, ep 50, val acc: 0.4122 ± 0.0694, train stats ebd_grad: 0.1784, clf_grad: 0.1618
21/12/06 15:35:22, ep 51, val acc: 0.4117 ± 0.0699, train stats ebd_grad: 0.1993, clf_grad: 0.1581
21/12/06 15:36:09, ep 52, val acc: 0.4019 ± 0.0732, train stats ebd_grad: 0.2204, clf_grad: 0.1500
21/12/06 15:36:55, ep 53, val acc: 0.4110 ± 0.0613, train stats ebd_grad: 0.2330, clf_grad: 0.1697
21/12/06 15:37:37, ep 54, val acc: 0.3985 ± 0.0745, train stats ebd_grad: 0.1669, clf_grad: 0.1831
21/12/06 15:38:24, ep 55, val acc: 0.4121 ± 0.0632, train stats ebd_grad: 0.2189, clf_grad: 0.1468
21/12/06 15:39:11, ep 56, val acc: 0.3960 ± 0.0736, train stats ebd_grad: 0.1914, clf_grad: 0.1578
21/12/06 15:39:55, ep 57, val acc: 0.4097 ± 0.0655, train stats ebd_grad: 0.2203, clf_grad: 0.1586
21/12/06 15:40:43, ep 58, val acc: 0.3933 ± 0.0676, train stats ebd_grad: 0.2062, clf_grad: 0.1673
21/12/06 15:41:28, ep 59, val acc: 0.4187 ± 0.0730, train stats ebd_grad: 0.2370, clf_grad: 0.1709
21/12/06 15:41:28, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/59
21/12/06 15:42:18, ep 60, train acc: 0.3824 ± 0.0674
21/12/06 15:42:39, ep 60, val acc: 0.4126 ± 0.0697, train stats ebd_grad: 0.2011, clf_grad: 0.1745
21/12/06 15:43:17, ep 61, val acc: 0.4232 ± 0.0726, train stats ebd_grad: 0.2257, clf_grad: 0.1619
21/12/06 15:43:17, Save cur best model to /data/rsg/nlp/yujia/projects/archive/Distributional-Signatures/tmp-runs/16388205412605564/61
21/12/06 15:43:58, ep 62, val acc: 0.4006 ± 0.0796, train stats ebd_grad: 0.2037, clf_grad: 0.1836
21/12/06 15:44:35, ep 63, val acc: 0.4193 ± 0.0713, train stats ebd_grad: 0.1966, clf_grad: 0.1663
21/12/06 15:45:10, ep 64, val acc: 0.4126 ± 0.0625, train stats ebd_grad: 0.2124, clf_grad: 0.1740
21/12/06 15:45:45, ep 65, val acc: 0.4139 ± 0.0615, train stats ebd_grad: 0.2151, clf_grad: 0.1699
21/12/06 15:46:20, ep 66, val acc: 0.4063 ± 0.0718, train stats ebd_grad: 0.2230, clf_grad: 0.1452
21/12/06 15:46:56, ep 67, val acc: 0.3982 ± 0.0674, train stats ebd_grad: 0.3468, clf_grad: 0.1641
21/12/06 15:47:31, ep 68, val acc: 0.3854 ± 0.0745, train stats ebd_grad: 0.2237, clf_grad: 0.1766
21/12/06 15:48:05, ep 69, val acc: 0.4020 ± 0.0741, train stats ebd_grad: 0.2251, clf_grad: 0.1631
21/12/06 15:48:40, ep 70, train acc: 0.3848 ± 0.0730
21/12/06 15:48:56, ep 70, val acc: 0.4230 ± 0.0716, train stats ebd_grad: 0.2938, clf_grad: 0.2011
21/12/06 15:49:31, ep 71, val acc: 0.4070 ± 0.0673, train stats ebd_grad: 0.2547, clf_grad: 0.1610
21/12/06 15:50:07, ep 72, val acc: 0.3917 ± 0.0759, train stats ebd_grad: 0.2113, clf_grad: 0.1771
21/12/06 15:50:42, ep 73, val acc: 0.4164 ± 0.0712, train stats ebd_grad: 0.2110, clf_grad: 0.1754
21/12/06 15:51:24, ep 74, val acc: 0.4017 ± 0.0714, train stats ebd_grad: 0.2340, clf_grad: 0.1472
21/12/06 15:52:10, ep 75, val acc: 0.4163 ± 0.0656, train stats ebd_grad: 0.2432, clf_grad: 0.1800
21/12/06 15:52:58, ep 76, val acc: 0.4064 ± 0.0713, train stats ebd_grad: 0.2991, clf_grad: 0.1894
21/12/06 15:53:39, ep 77, val acc: 0.4104 ± 0.0608, train stats ebd_grad: 0.2814, clf_grad: 0.1593
21/12/06 15:54:24, ep 78, val acc: 0.4149 ± 0.0689, train stats ebd_grad: 0.2390, clf_grad: 0.1909
21/12/06 15:55:08, ep 79, val acc: 0.3998 ± 0.0786, train stats ebd_grad: 0.3145, clf_grad: 0.1773
21/12/06 15:55:51, ep 80, train acc: 0.3872 ± 0.0793
21/12/06 15:56:10, ep 80, val acc: 0.4154 ± 0.0672, train stats ebd_grad: 0.2835, clf_grad: 0.1800
21/12/06 15:56:53, ep 81, val acc: 0.3992 ± 0.0779, train stats ebd_grad: 0.2684, clf_grad: 0.1770
21/12/06 15:56:53, End of training. Restore the best weights
21/12/06 16:00:32, acc mean 0.4166, std 0.0835
(signature) ➜ Distributional-Signatures git:(master) ✗
Hi, thanks a lot for your help.
My PyTorch
version is 1.2.0
, and my Transformers
version is 4.12.0
. I list it as below,
`
_libgcc_mutex 0.1 main
_openmp_mutex 4.5 1_gnu
blas 1.0 openblas
blessings 1.7 pypi_0 pypi
boto3 1.19.6 pypi_0 pypi
botocore 1.22.6 pypi_0 pypi
brotlipy 0.7.0 py37h27cfd23_1003
ca-certificates 2021.10.8 ha878542_0 conda-forge
certifi 2021.10.8 py37h89c1867_1 conda-forge
cffi 1.14.6 py37h400218f_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.0.3 pypi_0 pypi
colorama 0.4.4 pyh9f0ad1d_0 conda-forge
cryptography 35.0.0 py37hd23ed53_0
cudatoolkit 10.0.130 0
filelock 3.3.1 pypi_0 pypi
freetype 2.10.4 h5ab3b9f_0
giflib 5.2.1 h7b6447c_0
gpustat 0.6.0 pypi_0 pypi
huggingface-hub 0.0.19 pypi_0 pypi
idna 3.2 pyhd3eb1b0_0
importlib-metadata 4.8.1 pypi_0 pypi
intel-openmp 2021.3.0 h06a4308_3350
jmespath 0.10.0 pypi_0 pypi
joblib 1.1.0 pyhd8ed1ab_0 conda-forge
jpeg 9d h7f8727e_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.35.1 h7274673_9
libblas 3.9.0 1_h6e990d7_netlib conda-forge
libcblas 3.9.0 3_h893e4fe_netlib conda-forge
libffi 3.3 he6710b0_2
libgcc-ng 9.3.0 h5101ec6_17
libgfortran-ng 7.5.0 ha8ba4b0_17
libgfortran4 7.5.0 ha8ba4b0_17
libgomp 9.3.0 h5101ec6_17
liblapack 3.9.0 3_h893e4fe_netlib conda-forge
libopenblas 0.3.13 h4367d64_0
libpng 1.6.37 hbc83047_0
libstdcxx-ng 9.3.0 hd4cf53a_17
libtiff 4.2.0 h85742a9_0
libwebp 1.2.0 h89dd481_0
libwebp-base 1.2.0 h27cfd23_0
lz4-c 1.9.3 h295c915_1
mkl 2021.3.0 h06a4308_520
mkl-service 2.4.0 py37h7f8727e_0
ncurses 6.2 he6710b0_1
ninja 1.10.2 hff7bd54_1
numpy 1.21.3 pypi_0 pypi
nvidia-ml-py3 7.352.0 pypi_0 pypi
olefile 0.46 py37_0
openssl 1.1.1l h7f8727e_0
packaging 21.0 pypi_0 pypi
pillow 8.4.0 py37h5aabda8_0
pip 21.2.2 py37h06a4308_0
psutil 5.8.0 pypi_0 pypi
pycparser 2.20 py_2
pyopenssl 21.0.0 pyhd3eb1b0_1
pyparsing 3.0.3 pypi_0 pypi
pysocks 1.7.1 py37_1
python 3.7.11 h12debd9_0
python-dateutil 2.8.2 pypi_0 pypi
python_abi 3.7 2_cp37m conda-forge
pytorch 1.2.0 py3.7_cuda10.0.130_cudnn7.6.2_0 pytorch
pytorch-transformers 1.2.0 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.1 h27cfd23_0
regex 2021.10.23 pypi_0 pypi
requests 2.26.0 pyhd3eb1b0_0
s3transfer 0.5.0 pypi_0 pypi
sacremoses 0.0.46 pypi_0 pypi
scikit-learn 0.24.2 py37h18a542f_0 conda-forge
scipy 1.5.3 py37h8911b10_0 conda-forge
sentencepiece 0.1.96 pypi_0 pypi
setuptools 58.0.4 py37h06a4308_0
six 1.16.0 pyhd3eb1b0_0
sqlite 3.36.0 hc218d9a_0
termcolor 1.1.0 py37h06a4308_1
threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge
tk 8.6.11 h1ccaba5_0
tokenizers 0.10.3 pypi_0 pypi
torchtext 0.4.0 pyhb384e40_1 pytorch
torchvision 0.4.0 py37_cu100 pytorch
tqdm 4.62.3 pyhd8ed1ab_0 conda-forge
transformers 4.12.0 pypi_0 pypi
typing-extensions 3.10.0.2 pypi_0 pypi
urllib3 1.26.7 pyhd3eb1b0_0
wheel 0.37.0 pyhd3eb1b0_1
xz 5.2.5 h7b6447c_0
zipp 3.6.0 pypi_0 pypi
zlib 1.2.11 h7b6447c_3
zstd 1.4.9 haebb681_0
`
Thanks a lot for your help again.
Below is what I have run by my own environment, I used the clean and new code by recloning from your Github just a moment ago.
And may I know your environment?
It seems that the issue is related to the environment.
` (fewdoc2) jfhe@desktop:~/Documents/MountHe/jfhe/projects/Distributional-Signatures/src$ python main.py --bert --pretrained_bert bert-base-uncased --cuda 2 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target --seed=330
Parameters: AUXILIARY=[] BERT=True BERT_CACHE_DIR=None CLASSIFIER=r2d2 CLIP_GRAD=None CUDA=2 DATA_PATH=data/huffpost_bert_uncase.json DATASET=huffpost DROPOUT=0.1 EMBEDDING=meta FINETUNE_EBD=False FINETUNE_EPISODES=10 FINETUNE_LOSS_TYPE=softmax FINETUNE_MAXEPOCHS=5000 FINETUNE_SPLIT=0.8 INDUCT_ATT_DIM=64 INDUCT_HIDDEN_DIM=100 INDUCT_ITER=3 INDUCT_RNN_DIM=128 LR=0.001 LRD2_NUM_ITERS=5 MAML=False META_EBD=False META_IDF=False META_IWF=True META_TARGET_ENTROPY=False META_W_TARGET=True META_W_TARGET_LAM=1 MODE=train N_TEST_CLASS=16 N_TRAIN_CLASS=20 N_VAL_CLASS=5 N_WORKERS=10 NOTQDM=False PATIENCE=20 PRETRAINED_BERT=bert-base-uncased QUERY=25 RESULT_PATH= SAVE=False SEED=330 SHOT=1 SNAPSHOT= TEST_EPISODES=1000 TRAIN_EPISODES=100 TRAIN_EPOCHS=1000 VAL_EPISODES=100 WAY=5 WORD_VECTOR=wiki.en.vec WV_PATH=./
(Credit: Maija Haavisto) /
_,.------....___,.' ',.-.
,-' _,.--' |
,' _.-' .
/ , ,' `
. / / ``.
| | . \.\
____ |___._. | __ \ `.
.' `---'' ``'-.--''` \ . \
. , __ ` | .
`,' ,-'' . \ | L
,' ' _.' -._ / |
,`-. ,'. `--' >. ,' |
. .'\' `-' __ , ,-. / `.__.- ,'
||:, . ,' ; / / \ ` `. . .'/
j|:D \ `--' ' ,'_ . . `.__, \ , /
/ L:_ | . '' :_; `.'.'
. ''' '''''' V
`. . `. _,.. `
`,_ . . _,-'/ .. `,' __ `
) \`._ ___....----'' ,' .' \ | ' \ .
/ `. '`-.--'' _,' ,' `---' | `./ |
. _ `'''--.._____..--' , ' |
| .' `. `-. /-. / ,
| `._.' `,_ ; / ,' .
.' /| `-. . ,' , ,
'-.__ __ _,',' '`-..___;-...__ ,.'\ ____.___.'
`'^--'..' '-`-^-''-- `-^-'`.'''''''`.,^.`.--' mh
21/12/06 16:21:04: Loading data 21/12/06 16:21:04: Class balance: {19: 900, 4: 900, 5: 900, 8: 900, 1: 900, 13: 900, 31: 900, 16: 900, 36: 900, 39: 900, 14: 900, 11: 900, 23: 900, 17: 900, 7: 900, 21: 900, 26: 900, 12: 900, 18: 900, 37: 900, 6: 900, 22: 900, 40: 900, 15: 900, 29: 900, 10: 900, 35: 900, 38: 900, 9: 900, 25: 900, 30: 900, 20: 900, 3: 900, 27: 900, 24: 900, 34: 900, 33: 900, 32: 900, 0: 900, 2: 900, 28: 900} 21/12/06 16:21:04: Avg len: 13.077235772357724 21/12/06 16:21:04: Loading word vectors 21/12/06 16:21:19: Total num. of words: 9376, word vector dimension: 300 21/12/06 16:21:19: Num. of out-of-vocabulary words(they are initialized to zeros): 1586 21/12/06 16:21:19: #train 18000, #val 4500, #test 14400 21/12/06 16:21:21, Loading pretrained bert Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias']
For your FewRel exp with Bert, you need to specify —pos_max_len 300
to the command. This argument is used to initialize the positional embeddings in src/embedding/auxiliary/pos.py
.
The previous failure was due to the fact that your positional embedding didn’t cover enough position options for the input.
Hi, thanks a lot for your help. I have run FewRel with your help by adding --pos_max_len 300
without error reports.
However, the results from FewRel by Bert are also highly different from your Tab. 2.
I believe it should be the issue on something except codes and commands, such as the Conda configurations?
If you can provide a file including your Conda configurations by conda env export > signature.yaml
, it will be greatly helpful for us, who are interested in your work.
I also attach the results from FewRel by Bert to the end. ` main.py --cuda 0 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset fewrel --data_path data/fewrel_bert_uncase.json --n_train_class 65 --n_val_class 5 --n_test_class 10 --meta_iwf --meta_w_target --auxiliary pos --bert --pretrained_bert bert-base-uncased --pos_max_len 100
Parameters: AUXILIARY=['pos'] BERT=True BERT_CACHE_DIR=None CLASSIFIER=r2d2 CLIP_GRAD=None CUDA=0 DATA_PATH=data/fewrel_bert_uncase.json DATASET=fewrel DONE_QUEUE_LIMIT=100 DROPOUT=0.1 EMBEDDING=meta FINETUNE_EBD=False FINETUNE_EPISODES=10 FINETUNE_LOSS_TYPE=softmax FINETUNE_MAXEPOCHS=5000 FINETUNE_SPLIT=0.8 INDUCT_ATT_DIM=64 INDUCT_HIDDEN_DIM=100 INDUCT_ITER=3 INDUCT_RNN_DIM=128 LR=0.001 LRD2_NUM_ITERS=5 MAML=False META_EBD=False META_IDF=False META_IWF=True META_TARGET_ENTROPY=False META_W_TARGET=True META_W_TARGET_LAM=1 MODE=train N_TEST_CLASS=10 N_TRAIN_CLASS=65 N_VAL_CLASS=5 N_WORKERS=10 NOTQDM=False PATIENCE=20 POS_EBD_DIM=5 POS_MAX_LEN=100 PRETRAINED_BERT=bert-base-uncased QUERY=25 RESULT_PATH= SAVE=False SEED=330 SHOT=1 SLEEP_TIME=1 SNAPSHOT= TEST_EPISODES=1000 TRAIN_EPISODES=100 TRAIN_EPOCHS=1000 USE_DYNAMIC_CLASSIFIER=False VAL_EPISODES=100 WAY=5 WORD_VECTOR=wiki.en.vec WV_PATH=./
(Credit: Maija Haavisto) /
_,.------....___,.' ',.-.
,-' _,.--' |
,' _.-' .
/ , ,' `
. / / ``.
| | . \.\
____ |___._. | __ \ `.
.' `---'' ``'-.--''` \ . \
. , __ ` | .
`,' ,-'' . \ | L
,' ' _.' -._ / |
,`-. ,'. `--' >. ,' |
. .'\' `-' __ , ,-. / `.__.- ,'
||:, . ,' ; / / \ ` `. . .'/
j|:D \ `--' ' ,'_ . . `.__, \ , /
/ L:_ | . '' :_; `.'.'
. ''' '''''' V
`. . `. _,.. `
`,_ . . _,-'/ .. `,' __ `
) \`._ ___....----'' ,' .' \ | ' \ .
/ `. '`-.--'' _,' ,' `---' | `./ |
. _ `'''--.._____..--' , ' |
| .' `. `-. /-. / ,
| `._.' `,_ ; / ,' .
.' /| `-. . ,' , ,
'-.__ __ _,',' '`-..___;-...__ ,.'\ ____.___.'
`'^--'..' '-`-^-''-- `-^-'`.'''''''`.,^.`.--' mh
21/12/06 17:17:13: Loading data 21/12/06 17:17:13: Class balance: {41: 700, 42: 700, 43: 700, 25: 700, 26: 700, 27: 700, 76: 700, 77: 700, 44: 700, 45: 700, 46: 700, 50: 700, 51: 700, 56: 700, 57: 700, 61: 700, 62: 700, 63: 700, 10: 700, 11: 700, 30: 700, 31: 700, 14: 700, 15: 700, 67: 700, 68: 700, 69: 700, 20: 700, 21: 700, 36: 700, 37: 700, 4: 700, 5: 700, 54: 700, 55: 700, 60: 700, 8: 700, 9: 700, 28: 700, 29: 700, 16: 700, 17: 700, 66: 700, 22: 700, 23: 700, 34: 700, 35: 700, 2: 700, 3: 700, 40: 700, 74: 700, 75: 700, 48: 700, 49: 700, 58: 700, 59: 700, 6: 700, 7: 700, 32: 700, 33: 700, 18: 700, 19: 700, 64: 700, 24: 700, 38: 700, 0: 700, 1: 700, 72: 700, 73: 700, 47: 700, 70: 700, 78: 700, 79: 700, 12: 700, 52: 700, 53: 700, 71: 700, 13: 700, 65: 700, 39: 700} 21/12/06 17:17:13: Avg len: 28.964017857142856 21/12/06 17:17:13: Loading word vectors 21/12/06 17:17:15: Total num. of words: 17835, word vector dimension: 300 21/12/06 17:17:15: Num. of out-of-vocabulary words(they are initialized to zeros): 4423 21/12/06 17:17:15: #train 45500, #val 3500, #test 7000 21/12/06 17:17:18: Convert everything into np array for fast data loading finished 21/12/06 17:17:19: precompute_stats finished 21/12/06 17:17:19: start meta_w_target 21/12/06 17:17:19, Loading pretrained bert Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.weight']
Yes, this result definitely does not make sense (cuzs it is even lower than the non-BERT performance).
We used IBM cloud to run all our experiments three years ago. We could not export the environment information at that time (it was not available to us), but we do know that it was running PyTorch 1.0 and pytorch-transformers 1.1.0. It is a bit difficult to reinstall the exact same environment right now (we just cannot run the old PyTorch version on our current machines). We will consider to refactor the code base for the new APIs in the future.
Thanks a lot for your quick reply and I understand the situation. I will try the versions of your Pytorch and PyTorch-transformer, to see whether I can get normal results for BERT. Thanks a lot for your help.
I have run your command on Huffpost and the results seem normal to me (I have attached the 5 shot exp below). What is your PyTorch version and Transformer version?
Hi, I've also encountered the same issue as @he159ok in Huffpost dataset with best 1-shot performance around 0.30 only. Could you please share your pytorch and transformers version? Thanks a lot!
he159ok and I had an email discussion and we thought the reason comes from that, the current tokenization of BERT released from HuggioFace is different. Thus the previous tokenizations in "huffpost_bert_uncase.json" are not applicable to the current BERT model.
Could you please give an example of the Bert running code? Because I run it wrong and get abnormal results by below command,
python main.py --bert --pretrained_bert bert-base-uncased --cuda 0 --way 5 --shot 1 --query 25 --mode train --embedding meta --classifier r2d2 --dataset huffpost --data_path data/huffpost_bert_uncase.json --n_train_class 20 --n_val_class 5 --n_test_class 16 --meta_iwf --meta_w_target