AtlasOfLivingAustralia / ALA4R

Access data and resources hosted by the Atlas of Living Australia (ALA)
https://atlasoflivingaustralia.github.io/ALA4R/
42 stars 8 forks source link

search_names and hyphenation #4

Closed johnbaums closed 9 years ago

johnbaums commented 9 years ago

Something weird is going on with hyphenation in search_names. Take for example the species Acaena novae-zelandiae:

search_names('Acaena novae zelandiae') returns

  searchTerm               name                     commonName                                       rank      guid                                            
1 "Acaena novae zelandiae" "Acaena novae-zelandiae" "Biddy Biddy, Biddy-widdy, Bidgee-widgee, Buzzy" "species" "urn:lsid:biodiversity.org.au:apni.taxon:376906"

but search_names('Acaena novae-zelandiae') returns an empty matrix.

Tasilee commented 9 years ago

I note that

search_fulltext("Acaena novae zelandiae") = search_fulltext("Acaena novae-zelandiae") = search_partial_name("Acaena novae zelandiae") = search_partial_name("Acaena novae-zelandiae") = search_partial_name("Acaena novae")

nickdos commented 9 years ago

@djtfmartin for the rewrite - you might want to check the SOLR query analyser for the search_names field. It looks like the indexing analyser is removing hyphens but the query one is not. They should be using the same analyser, from memory.

raymondben commented 9 years ago

Fixed in v1.11. It was a problem with slightly over-zealous cleaning of the search term before passing it to the web service. Other than trimming whitespaces, we now pass the search term unchanged.

search_names('Acaena novae-zelandiae')

searchTerm name commonName rank guid
1 "Acaena novae-zelandiae" "Acaena novae-zelandiae" "Biddy Biddy, Biddy-widdy, Bidgee-widgee, Buzzy" "species" "urn:lsid:biodiversity.org.au:apni.taxon:376906"