allenai / peS2o

Pretraining Efficiently on S2ORC!
Apache License 2.0
143 stars 4 forks source link

README states S2ORC contains 11.3M papers but the original S2ORC paper claims 80M #4

Closed jtalmi closed 1 year ago

jtalmi commented 1 year ago

The original paper says there are 80M papers available: https://aclanthology.org/2020.acl-main.447.pdf

Curious why there's a difference here?

soldni commented 1 year ago

Hi @jtalmi,

There are two factors at play here:

Hope that helps!

Best, Luca

jtalmi commented 1 year ago

thanks!