allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

Make data open by archiving on Zenodo or Figshare #35

Open cthoyt opened 2 years ago

cthoyt commented 2 years ago

Currently the README says that a potential consumer of S2ORC should fill out a Google Form to get a download link. For a resource that has the word "Open" in the title, this is quite a closed choice. I would highly suggest posting the data to an archive service like Zenodo or FigShare, which both keep track of different versions of the same record with high granularity metadata like the license information.

Further, having this process be manual means that it is dependent on the maintainers of the repository to be checking and handling the requests, so if the ever lose motivation or leave your group, then this resource would effectively become dead.

The other parts of the form just seem to be collecting user information and having the user confirm the license, which seems like something that could be skipped.