allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

Citing section mostly missing #10

Closed malteos closed 4 years ago

malteos commented 4 years ago

I quickly checked on a sample of 35,414 papers from the dataset and none has the section set in paper.grobid_parse.body_text. Is it just bad luck or is the section information missing for most papers?

kyleclo commented 4 years ago

I'll look into this ASAP, thanks for bringing attention

kyleclo commented 4 years ago

@malteos Thanks for drawing attention to this. It's been fixed in the most recent release 20200705v1. Here's a screenshot of grep '"section":'

image

malteos commented 4 years ago

Awesome! Thanks for your work