allenai / s2orc

S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
800 stars 64 forks source link

How to extract citation sentences ? #33

Closed Soumyajain29 closed 1 year ago

Soumyajain29 commented 3 years ago

Hii, I am working with the dataset. For my research work, I want to extract citation sentences.
For example, suppose paper A cites paper B. Then there must be a sentence in paper A describing paper B. I want to extract this sentence. I know citation spans are available, which will help in extracting those sentences. But is there any preferred library (NLTK, spacy ) / way that should be used for sentence spitting? I tried using spacy. But I am unable to extract these sentences properly.

It would be very helpful if an example code snippet can be shared.

Thanks

kyleclo commented 3 years ago

Hi @Soumyajain29, I would recommend checking out https://allenai.github.io/scispacy/ for sentence splitting.

Soumyajain29 commented 3 years ago

Thanks for the suggestion, Kyle.