WormBase / ACKnowledge

Author Curation to Knowledgebases
MIT License
1 stars 1 forks source link

WBPaper00054648 - didn't extract C. elegans as species #104

Closed vanaukenk closed 4 years ago

vanaukenk commented 5 years ago

Just looking over one of the author submissions, WBPaper00054648 and saw that the pipeline didn't extract C. elegans as a species for this paper, although some version of 'elegans' is mentioned 58 times.

The author did add it.

valearna commented 5 years ago

In some cases species are mentioned without a whitespace - e.g., C.elegans. We can modify the regex to get those matches

draciti commented 5 years ago

FYI also Paper 00054817 had 32 mentions of elegans and it was not extracted

valearna commented 4 years ago

Modifying the regex to capture matches to species not surrounded by whitespaces is not feasible. This is a pdf to text conversion issue.

draciti commented 4 years ago

this might be resolved when we will assign c elegans by default. @vanaukenk I leave it up to you if closing this ticket