Closed liyongqi67 closed 5 years ago
You may carefully read the README.md
where the description is already very detailed.
The difference in performance of baseline AR between in paper and in competition is due to that they adapt different problem forms (you can refer to Competition
for a example of the adapted ChID dataset for the competition) but not whether using the pre-trained word embeddings (in fact, they both used it).
In the competition, a list of passages (not an isolated one) is provided and the answers need to be selected from a given set of candidate idioms with fixed length. But as described in the paper, the original ChID dataset provides a list of candidate idioms for each blank (neither the whole passage or a set of passages) and thus does not establish up connections between blanks.
As a result of the large scale of the original ChID dataset, models can be trained sufficiently and thus achieve a high scores in test. We found the AR model tended to converge after about 5 epoches (without too much training tricks) trained on Train of the original ChID dataset.
However, it is also due to the large size of the corpus that the pattern of designing options may be found and further be utilized to answer queries. This may lead to something like cheating because the model does not truly understand idioms in that case. In order to avoid the possible regularity in the option construction, we redesign the question form in the competition as you see, which also results in a more challenging problem.
Thanks for your reply! I just understand the difference between the competition and the paper.
I noticed that the AR model could obtain 72.7 scores in the paper, and the AR model obtained 65.4 scores in the competition. Is it because that the AR model in the competition did not use the pre-trained word embeddings? I noticed the code "if os.path.exists("newWordvector.txt"):" and there is no newWordvector.txt, so I think the AR model can obtain 72.7 scores with pre-trained word embeddings and obtain 65.4 without pre-trained word embeddings. How many epochs for the AR model to get the best performance and how about the performance in the train dataset? Thanks!