AtlasOfLivingAustralia / biocache-store

Occurrence processing, indexing and batch processing
Other
7 stars 24 forks source link

Make certain search (only) fields case insensitive #322

Open nickdos opened 5 years ago

nickdos commented 5 years ago

See also #76.

I think there is a case to at least create a copyTo field version of the taxon_name field. The problem is this field is currently case sensitive, so the user has to know the case of the indexed value in order to search for it. E.g. Acacia dealbata works but acacia dealbata returns nothing. Even worse is the search for ANIMALIA - you have to search for it in all caps to find records for that name.

This is a problem in the batch taxon search form where we allow users to search with multiple names. It works fine for raw_name (insensitive version of raw_taxon_name I think), which is case insensitive but not for taxon_name (enhancement I'm working on).

@djtfmartin suggested in #76 that we use the plain q=acacia dealbata query but this fails for terms like acacia and animalia because it searches in other fields (is replaced by text:foo), such as the various remarks fields and therefore blows out the record count because it brings back records that only mention those terms in those other fields.

There are probably a few other fields where we want to do this as well - will update if I think of them.

nickdos commented 5 years ago

https://biocache-ws.ala.org.au/ws/occurrences/search?q=ANIMALIA - 59,213,274 results https://biocache-ws.ala.org.au/ws/occurrences/search?q=taxon_name:Animalia - 0 results https://biocache-ws.ala.org.au/ws/occurrences/search?q=taxon_name:ANIMALIA - 33,348 results

ansell commented 5 years ago

If taxon_name:ANIMALIA is only matching against raw_taxon_name, the result is expected, but still not what we want to happen ideally. Almost noone ever submits records that include ANIMALIA.

nickdos commented 5 years ago

@ansell taxon_name matches the processed/matched name. You're thinking of raw_name which is a textgen version of raw_taxon_name (String type). This issue is about creating a similar text type field for taxon_name that can be used for humans to search for accepted names.

nickdos commented 5 years ago

The alternative to creating a case-insensitive version of taxon_name is to have a pseudo-field that takes the input name/s and does a lookup against the name_matching_index, which then matches to a GUID and then searches the index with the GUID. I thought we had something like this already but I couldn't work out what it was. @adam-collins or @djtfmartin does this ring a bell?

Edit: biocache-hubs/ala-hub does this but I'm talking about on the biocache-service side only.

Edit: Adam advise the taxa field does work on the service side as well. Its not listed in /index/fields which is why I was not aware of it. I think this should solve the immediate issue at hand.