Closed emanuelevivoli closed 3 years ago
Emanuele, we don't extract Keywords/Keyphrases explicitly. First of all, not all papers have them, and when they are there, the format can be variable. In S2ORC, they sometimes wind up in the Abstract or first paragraphs of the body text. You can try to use regexes to pull them out. That's my best recommendation for now. Good luck!
Hello, thanks for the awesome work you did! Usually in scientific papers there is a section containing the paper's
KeyPhrases
orKeyWords
. I didn't see any section/properties named neitherKeyPhrases
norKeyWords
or anything similar, so my question (supposing there must be some remaining information in thebody_text
property) is:Do you have any method (or any advice) for extracting this data from the
body_text
?I'd like to build a "Keyphrase dataset" from the
S2ORC dataset
.Thanks for your help, Emanuele.