Why EM and F1 are lower than official results?

galsang / BiDAF-pytorch

Re-implementation of BiDAF(Bidirectional Attention Flow for Machine Comprehension, Minjoon Seo et al., ICLR 2017) on PyTorch.

244 stars 85 forks source link

Why EM and F1 are lower than official results? #1

Closed sharejing closed 6 years ago

sharejing commented 6 years ago

Hi, galsang. Nice work! But I want to know why EM and F1 are lower than official results. Have you analyzed it? Thanks!

galsang commented 6 years ago

I don't know the exact reason, but there are some possible candidates. A possible one is the different initialization of the weights, which can be a big impact in real.

sharejing commented 6 years ago

Hi, galsang. I think so.
I prefer to Pytorch, and I have been doing things on reading comprehension. Thank you for your reply!

zhiqihuang commented 4 years ago

I suspect that the paper results are based on GloVe 300d. And the lower results are from GloVe 100d. Just a guess.