allenai / s2-folks

Public space for the user community of Semantic Scholar APIs to share scripts, report issues, and make suggestions.
Other
144 stars 25 forks source link

Q: Script for translating S2ORC Annotations to S2ORC-doc2json format? #126

Closed maybay21 closed 9 months ago

maybay21 commented 11 months ago

Hi All,

First, thank you so much for all of your hard work on the Semantic Scholar dataset. It's been invaluable to my research.

I have a quick question wrt the S2ORC dataset. I completed a bulk dataset download through the API, however the files from the API are not in the same format as the output from S2ORC-doc2json. I imagine this is intentional. Is there a helper script to translate the annotations into the S2ORC-doc2json format? I received files in the format below:

recieved

My goal is to translate to the original S2ORC format:

expected

If there's not something on-hand, I'll hack a script together. Thanks!

cfiorelli commented 10 months ago

@maybay21 It sounds like by now you might have already resolved this wrt your "hack script together" + my delayed reply? Sorry for the delay here.

cfiorelli commented 9 months ago

Closing in lieu of requestor reply

IceSuger commented 7 months ago

Hi, @maybay21 , would you like to share your hack script? Thanks!