The performance of TAPAS in huggingface model hub is not consist with that of the original ones

FeiWang96 commented 3 years ago

Hi,

As shown in this notebook, the accuracy of tapas-base-finetuned-tabfact in huggingface model hub on test set is 77.1 while it is reported as 78.5 in the paper. What attributes to the performance drop? Is it due to some unknown bugs of the pytorch implementation?

Thank you!

eisenjulian commented 3 years ago

Hi @FeiWang96 thanks for your question and sorry for the delay. I don't have an exact answer since it was @NielsRogge who worked in the HuggingFace implementation. By looking at the notebook, one guess is that it could be a slightly different way in which we preprocess the examples to feed into the model could come into play, which may come into play for tables that don't fit into 512 tokens. It would be nice if we could try to pinpoint where the difference comes from: If the models give different outputs for the same input data, or if the generated input data differs.

NielsRogge commented 3 years ago

So the reason was found! It turns out I was using a lemmatized version of TabFact, instead of the original statements. See https://github.com/NielsRogge/Transformers-Tutorials/issues/2#issuecomment-812770056

FeiWang96 commented 3 years ago

Thank you! It's really helpful. @NielsRogge

google-research / tapas

The performance of TAPAS in huggingface model hub is not consist with that of the original ones #108