AtlasOfLivingAustralia / biocache-hubs

Biocache Hub UI grails plugin
Other
3 stars 30 forks source link

Correct for leading '0' when searching by catalog number #559

Closed timhicks-ala closed 1 year ago

timhicks-ala commented 1 year ago

Suggestion from a user:

"There is a small issue with Catalogue Search (https://biocache.ala.org.au/search#tab_catalogUpload) that I think may be readily solvable.

If I search for the following:

PERTH 3070018 PERTH 9330291 PERTH 9330348

I get a good result. If however I search for:

PERTH 03070018 PERTH 09330291 PERTH 09330348

I get a null result.

The problem is that the PERTH numbers with leading zeros are the actual numbers as recorded on PERTH sheet barcodes. Many other institutions also use leading zeros (for a reason I can never quite fathom).

A solution could be to strip the leading zeros for all accession numbers before performing the search. This would work OK if ALA processing of records removed leading zeros in catalogue numbers elsewhere in your system (or if e.g. they are converted from strings to integers somewhere). I understand though that this could cause the inverse problem if some institutions provide their records with leading zeros and these are maintained.

With a bit of analysis I expect a solution can be found."

Relevant helpdesk ticket: https://support.ehelp.edu.au/a/tickets/159353

peggynewman commented 1 year ago

Looks like this has been resolved by WA reloading data since this was raised, and they included the leading zero, so now the search works properly. I don't think we should knock ourselves out trying to find ways of including leading zeros before numbers in the midst of what really is an ordinary string.

See https://biocache.ala.org.au/occurrences/search?q=catalogue_number%3A%22PERTH+03070018%22

nielsklazenga commented 1 year ago

Absolutely right. Catalogue numbers are strings (or tokens rather), not numbers (hence the leading zeros). 'PERTH 3070018' and 'PERTH 03070018' are different catalogue numbers. There should not be fuzzy matching.

catalogNumbers are the row keys column for PERTH records. If they have indeed changed a lot of their catalogue numbers because of this, record ids will have changed as well and we'll soon get complaints from you know who.

peggynewman commented 1 year ago

GBIF have already fired up on their behalf! I'll close this, no need to do anything.