allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

fail to download metadata.tsv.gz #4

Closed ForeverGoGOING closed 4 years ago

ForeverGoGOING commented 4 years ago

Hi, I am very interested in this dataset! but just now I wanted to download the metadata.tsv.gz with the following code ------ download metadata-------- s3_metadata_file ='20190928/metadata.tsv.gz' local_meta_file = os.path.join(LOCAL_GORC_DIR, 'metadata.tsv.gz') download_from_s3(bucket, s3_metadata_file, local_meta_file, aws_attribs) I got a 404 error , is there anything wrong my code?

kyleclo commented 4 years ago

Just to check, from your shell command line can you try: aws s3 ls s3://ai2-s2-gorc-release/20190928/. Do you see a list of objects?

Does aws s3 cp s3://ai2-s2-gorc-release/20190928/metadata.tar.gz <LOCAL_GORC_DIR> work? (replace <LOCAL_GORC_DIR> with the actual path)

ForeverGoGOING commented 4 years ago

thanks for your help!