allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

Update of papers in S2ORC #31

Closed viswavi closed 3 years ago

viswavi commented 3 years ago

Hi,

Thanks again for creating S2ORC - it's been such a great tool for research.

I'm wondering if there are any plans to update the set of papers included in S2ORC? The last update included papers through 4/14/20, which was over a year ago. I'm working on a tool for the AI/ML community, and it would be maximally useful if we were able to include papers published in the last year (which are surprisingly many).

Thank you!

lucylw commented 3 years ago

Hi there, we are currently working on a new version of S2ORC with many data/extraction quality improvements; though it is likely to be several months or longer till this new release. In the meantime, we will not be releasing a new version of S2ORC that uses the old models. If you have specific papers from the last year you would like to process, you can look into https://github.com/allenai/s2orc-doc2json, which allows you to produce S2ORC parses from paper PDFs. Hope this helps!