Closed draciti closed 2 years ago
Sure thing @draciti. It looks like a text conversion issue. I can't find any mentions to 'tropicalis' in the converted txt.
Thanks @valearna, that's what i suspected but wanted to make sure. I don't think we can do much for now. Closing. We will revisit when we will talk about taking text from Open Access. thx!
@valearna I am reopening this issue, can we look a little closer on why tropicalis was not picked up. Not sure if there's much to do but PubTator and EuropePMC extract it. thanks
@draciti I can re-process this paper with the new pdf2txt library that we are now using in production and see if tropicalis gets picked up. I'll run it on mangolassi.
Can you take a look at the species found by using the same algorithm that is now in prod? Here's the link: https://tinyurl.com/yge9x9ke Note that the author already submitted data for this paper. I have deleted the list of species submitted by the author on mangolassi to show you the results of the extraction directly in the form
The extraction now picks up Caenorhabditis tropicalis but also Candida tropicalis.
I don't know if we've encountered a similar situation before, i.e. two possible species matches for the same genus and species abbreviation, e.g. 'C. tropicalis'.
Only the Caenorhabditis tropicalis is correct..
We had a similar case with Caenorhabditis elegans and Cunninghamella elegans and we decided to blacklist Cunninghamella elegans. We could do the same with Candida tropicalis.
@valearna I think it would be okay to also blacklist Candida tropicalis. At least for now. @draciti what do you think?
Agreed, we can blacklist Candida tropicalis. Thanks Valerio for looking into this. @valearna , once you add Candida tropicalis in the blacklist, you can close this ticket
Added taxon ID 5482 to exclusion list
Thanks @draciti and @vanaukenk. I'll test the new pipeline on mangolassi before closing
Tested - it works. Closing
for WBPaper00054685. @valearna , when you get a chance can you please take a look?