allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus:
800 stars 64 forks source link

GORC download dataset #2

Closed Mayar2009 closed 4 years ago

Mayar2009 commented 4 years ago

Hi! Excuse me if I miss the obvious but this line is ambiguous for me The use of this data is subject to the Semantic Scholar Dataset License. following Scholar Dataset License link then accept the license agreemen and in AWS CLI writing the instruction aws s3 cp --no-sign-request --recursive s3://ai2-s2-research-public/open-corpus/2020-02-01/ destinationPath

will install corpus but really I did not understand this is GORC dataset or what?

kyleclo commented 4 years ago

hey @Mayar2009, sorry for any confusion. The download instructions for this specific dataset is in the README of this github repo, and the dataset is located at s3://ai2-s2-gorc-release/. The usage of this dataset is subject to the same license agreement shared by all of SemanticScholar datasets, which is linked to from

The specific instructions you're looking at now is for a different dataset called open corpus, which does not contain much of the information that we provide in GORC