AtlasOfLivingAustralia / ala-name-matching

Atlas name matching API and index generation
Other
10 stars 13 forks source link

Virus stop pattern being inconsistently applied #177

Closed charvolant closed 1 year ago

charvolant commented 1 year ago

This one is ancient but exacerbated by attempts to make searches more precise.

The code for storing the name without any " virus" in it is at https://github.com/AtlasOfLivingAustralia/ala-name-matching/blob/ba5be3f18f751be85b2452ae7d5dfad91c6fc1f4/ala-name-matching-builder/src/main/java/au/org/ala/names/search/ALANameIndexer.java#L945 and stores the a name of something like "Arbovirus: Exotic West Nile virus" as "Arbovirus: Exotic West Nile " (note the space). The search is at https://github.com/AtlasOfLivingAustralia/ala-name-matching/blob/ba5be3f18f751be85b2452ae7d5dfad91c6fc1f4/ala-name-matching-search/src/main/java/au/org/ala/names/search/ALANameSearcher.java#L988 and has a trim() at the end, so Arbovirus: Exotic West Nile virus" becomes "Arbovirus: Exotic West Nile"

The query for scientific names used is exact and the space at the end matters.