gnames / gnverifier

GNverifier verifies scientific names against more than 100 biodiversity databases
https://verifier.globalnames.org
MIT License
19 stars 1 forks source link

Virus names? #76

Closed dshorthouse closed 2 years ago

dshorthouse commented 2 years ago

Are virus names out-of-scope for gnverifier? They appear to be handled on https://resolver.globalnames.org/ but attempts to find eg Tobacco mosaic virus (as present in the Catalogue of Life) return no match.

dimus commented 2 years ago

Viruses are not out of scope, but are not implemented fully yet. I will close this ticket when they will work similar to resolver.globanames.org

dshorthouse commented 2 years ago

Thanks, @dimus!

dimus commented 2 years ago

Accidently I found an awesome Go standard library for this task https://eli.thegreenplace.net/2016/suffix-arrays-in-the-go-standard-library/

cgendreau commented 2 years ago

Thanks @dimus !

It seems to work on https://verifier.globalnames.org/api/v0/verifications/Tobacco%20rattle%20virus

Should we point to v0 instead of v1 ?

dshorthouse commented 2 years ago

Perhaps, if I may, clarification on expected use of /verifications vs /search. Are they interchangeable or do they have different expectations in input & output? For some of our use-cases, @cgendreau it would seem that /search is more appropriate because it can handle abbreviated scientific names whereas /verifications does not.

However, the example https://verifier.globalnames.org/api/v0/search/n:Tobacco+rattle+virus+ds:1 apparently does not work, I assume because the content in n: is apparently parsed.

cgendreau commented 2 years ago

yes, I'm planning to use /search but since viruses were not working I wasn't sure.

dshorthouse commented 2 years ago

Perhaps one way to handle this, though might be cumbersome to implement & will incur an additional delay in the front-end, is for the client to take a first pass at /verifications but then fall-back to /search when names[0][matchType] == "NoMatch" in the first response.

And so, it might look like this:

https://verifier.globalnames.org/api/v0/verifications/P.+moesta?data_sources=1 [NoMatch] ...then... https://verifier.globalnames.org/api/v0/search/n:P.+moesta+ds:1

versus

https://verifier.globalnames.org/api/v0/verifications/Tobacco+mosaic+virus?data_sources=1 [Stop]

dimus commented 2 years ago

Viruses are different from other names, they do not have a parsing stage and are simply matching as a substring from the start of virus names in the database. For example

https://verifier.globalnames.org/api/v0/verifications/Influenza%20B%20virus?data_sources=4&all_matches=true

Shows that strings matching Influenza B virus might have additional strain info, or an annotation, and still match.

Search relies on knowing where a name-string has genus, species authors, years etc. For viruses such information is much harder to get at the moment, so they are not parsed and cannot be found by search

@cgendreau api/v1 does not support any new stuff, so api/v0 is the right choice.

Output-wise verification and search return the same structure, so you can rely on consistency in result fields. The only difference is that in JSON output some empty fields are not returned to save bandwidth.

Here is the definition of a Name verificatin or search output

https://github.com/gnames/gnlib/blob/master/ent/verifier/name.go

dimus commented 2 years ago

the idea for api/v0 is to add new things (like search), and, if needed, make changes in fields names etc. I suspect that the main changes are done already, and now it will mostly be appended with additional fields (like vernaculars).

When it stabilizes, /api/v0 will become /api/v2

/api/v1 exists for people who already made scripts for it, or who did not upgrade their gnverifier. It will only have bug fixes

dimus commented 2 years ago

I would use API like this:

  1. Try to parse a name with gnparser to see if it is a "virus" or not.

  2. If I have a whole name and I believe it is "correct", also, if I go through many names: api/v0/verification

  3. If a name is tricky, and I go one name at a time and eyeball the results: /api/v0/search

  4. If a name is a "virus": api/v0/verification

I would not use api/v1

dshorthouse commented 2 years ago

Thanks, @dimus. This is partly a client implementation issue. Your

  1. Try to parse a name with gnparser to see if it is a "virus" or not.

May not be possible in all instances, especially if the client's integration (as is ours) operates almost entirely in the front-end with user-supplied content. Now, if there was a NodeJS-based port of gnparser, that could be a solution to help shunt the calls to one API vs the other.

dshorthouse commented 2 years ago

Aha! @cgendreau See https://github.com/amazingplants/node-gnparser

cgendreau commented 2 years ago

I actually know in advance if we are looking for a virus (it's a virus collection).

dimus commented 2 years ago

@cgendreau, @dshorthouse, please look at https://github.com/gnames/gnames/releases/tag/v0.8.0

I am going to publish it today. This release contains several incompatible changes (mostly for verification) Docs are updated https://apidoc.globalnames.org/gnames-beta

Also I think this ticket exlains the confusing change in verification: https://github.com/gnames/gnverifier/issues/90

cgendreau commented 2 years ago

thanks @dimus . How can I test it from the api? Is it already running behind api/v0 ?

dimus commented 2 years ago

@cgendreau yes, it is live

https://verifier.globalnames.org/api/v0/verifications/Pomatomus%20soltator%7CBubo%20bubo%7CIsoetes%20longissimum

returns bestResult for each name, field results is empty

https://verifier.globalnames.org/api/v0/verifications/Pomatomus%20soltator|Bubo%20bubo|Isoetes%20longissimum?data_sources=11|12

Only queries results from data-sources 11 and 12 (it is much faster than a query without data_source parameter)

https://verifier.globalnames.org/api/v0/verifications/Pomatomus%20soltator|Bubo%20bubo|Isoetes%20longissimum?all_matches=true

provides all found matches in results, field bestResult is null

https://verifier.globalnames.org/api/v0/verifications/Pomatomus%20soltator|Bubo%20bubo|Isoetes%20longissimum?all_matches=true&data_sources=11|12

returns all matches found in 11 and 12 data-sources