Closed light8lee closed 4 years ago
Hi and thanks! You are right, apologies for this. Indeed it is worse, but if you freeze the embedding layer the performance will be ok (actually better) with pre-trained embeddings.
The reported results are indeed with random embeddings because of this, but there is an issue with the number of words when we have (or not) given as additional input to the model pre-trained words, hence the difference you see in the performance.
I will update the code to include this (pre-trained + freeze) in a new branch and you can re-run it if you want. I will report there my new performance as well. If possible, just give me a few days to update the code and re-run and I will get back to you via this issue :)
Thanks again for catching that!
Ok, I'm looking forward to your update, thank you for your reply.
Hey there,
I am very very sorry for the later reply. I have updated the master with the corrected version and created a tag for the version that reproduces the paper results.
The difference now is that the embedding layer is frozen and L=16 works best (from my experiments, I got around 67% performance on the CDR test set). This is depicted in the config file as well. Please feel free to try the new version and let me know :)
This is now fixed so I am closing this issue.
In
modules.py
at line56
, you used:but the
from_pretrained
method is a class method, which does not change theself.embedding
, so only randomly initialized embedding is used and I think it should bebut when I fixed this bug, the PubMed result (about 0.61) is not good as random (about 0.63, which is your result of PubMed) in CDR dataset.