Closed johncs999 closed 4 years ago
hi, wang, thanks for your codes ! I want to know how to run CatSeq and CatSeqD by your code?
Hi, thank you for your interest. You can download the sh files of CatSeq and CatSeqD via the following link: https://drive.google.com/file/d/1nDscn10W8Dajwvse0XHwkp6HI583OQ1C/view?usp=sharing. The official implementation of these two methods are also public in https://github.com/memray/OpenNMT-kpg-release.
Thanks for your detailed reply. BTW, what's the difference between the given vocab.pt
file and the generated vocab.pt
file by preprocess.py
?
Thanks for your detailed reply. BTW, what's the difference between the given
vocab.pt
file and the generatedvocab.pt
file bypreprocess.py
?
The given "vocab.pt" is the vocab before "RmKeysAllUnk". We perform "RmKeysAllUnk" based on the given vocab and choose the given vocab as the final vocab. If we generate the vocab after "RmKeysAllUnk", the "RmKeysAllUnk" may be meaningless since some keyphrases may become all unks based on the generated new vocab.
Thanks for your reply. It seems that the preprocessing method here is not the same as that of OpenNMT-kpg-release
. I am curious why the unk keyphrase should be removed since there is a copy mechanism, and if some keyphrases are removed from the test set, it does not seem to be comparable with the results in OpenNMT-kpg-release
. Can you share the preprocessing file ? Thanks.
Thanks for your reply. It seems that the preprocessing method here is not the same as that of
OpenNMT-kpg-release
. I am curious why the unk keyphrase should be removed since there is a copy mechanism, and if some keyphrases are removed from the test set, it does not seem to be comparable with the results inOpenNMT-kpg-release
.
Yes, the preprocessing is different from OpenNMT-kpg-release
. The "RmKeysAllUnk" is not performed on the test set. It is only performed on the absent keyphrases of the training and validation sets. The purpose is to remove some invalid training absent keyphrases since they encourage the model to produce unk tokens.
It may take a long time to get the preprocessing files since they are stored in the PC of the lab and unfortunately the PC is broken. Besides, I am not in Hong Kong because of the serious COVID-19 situation. However, I will try to get them and then share them with you.
Thanks for your kind reply. I found the numbers of keyphrases are mismatch with the one in OpenNMT-kpg-release
on semeval
testset:
total | absent | present | |
---|---|---|---|
OpenNMT-kpg-release | 1507 | 836 | 671 |
ExHiRD | 1440 | 812 | 628 |
Thanks for your kind reply. I found the numbers of keyphrases are mismatch with the one in
OpenNMT-kpg-release
onsemeval
testset:total absent present OpenNMT-kpg-release 1507 836 671 ExHiRD 1440 812 628
- Is that because you removed some duplicated keyphrases after stemming ?
- I found that it seems you use the stemmed keyphrases and unstemmed context in training, did stemmed words and unstemmed words share the same embedding ? If not, would there be a semantic gap ?
Thanks a lot !
Hi @Chen-Wang-CUHK @johncs999,
Can you share the preprocessing file use to remove unk tokens from absent keyphrases i.e. RmKeysAllUnk
version?
Is it the one shared in sh/preprocess
folder?
hi, wang, thanks for your codes ! I want to know how to run CatSeq and CatSeqD by your code?