AtlasOfLivingAustralia / biocache-service

Occurrence & mapping webservices
https://biocache-ws.ala.org.au/ws/
Other
9 stars 26 forks source link

`taxonConceptID` searches no longer work when no field is specified #732

Closed nickdos closed 6 months ago

nickdos commented 2 years ago

As reported by Yong from BCCVL. This URL used to work but now returns 0 results:

https://biocache-ws.ala.org.au/ws/occurrences/search?q=urn:lsid:biodiversity.org.au:afd.taxon:c303a58c-ffb9-4bf6-a71b-5e22299c5ee2

adding lsid: or taxonConceptID: to the param value fixes it:

https://biocache-ws.ala.org.au/ws/occurrences/search?q=lsid:urn:lsid:biodiversity.org.au:afd.taxon:c303a58c-ffb9-4bf6-a71b-5e22299c5ee2 https://biocache-ws.ala.org.au/ws/occurrences/search?q=taxonConceptID:urn:lsid:biodiversity.org.au:afd.taxon:c303a58c-ffb9-4bf6-a71b-5e22299c5ee2

I suspect a change to the default text: field has broken backwards compatibility, possibly due to added stemming or other SOLR filters used in the text field's schema definition.

qifeng-bai commented 2 years ago

My two cents:

Not sure if it is the reason, I found : q=lsid:xxxxx will be translated to : q=taxonConceptID:xxxxxxxx in this method:

https://github.com/AtlasOfLivingAustralia/biocache-service/blob/develop/src/main/java/au/org/ala/biocache/util/solr/FieldMappingUtil.java#L79

taxonConceptID is not in 'Text' field - 'text' field is a defaultSearchField We do not need to copyTo 'lsid' to 'text' field, see the reason above

nickdos commented 2 years ago

It might be related, in that the taxonConceptID field is not being added to the text field (via copyTo) in the SOLR schema file.

qifeng-bai commented 2 years ago

Schema update process for biocache-test:

Edit: https://github.com/gbif/pipelines/blob/dev/livingatlas/solr/conf/managed-schema

Add <copyField source="taxonConceptID" dest="text"/> to managed-schema Check and run: https://github.com/gbif/pipelines/blob/dev/livingatlas/solr/scripts/update-solr-cluster-config.sh

It uploads the new config to the Solr server (in zookeeper folder)

Processes:

1, SSH tunnel the solr server, e.g. ssh nci3-solr-3.ala -L 8983:localhost:8983 2, backup the old schema with timestamp, load the new one 3, zookeeper will distribute the current schema to SOLR servers (Hopefully) 4, Jenkin on nci3-jenkins.ala rebuild index

To be continued

adam-collins commented 11 months ago

There is a need to wrap searches of the default text field in double quotes. e.g. https://biocache-ws.ala.org.au/ws/occurrences/search?q=NZOR-6-73174 vs https://biocache-ws.ala.org.au/ws/occurrences/search?q=%22NZOR-6-73174%22

However this is currently failing https://biocache.ala.org.au/ws/occurrences/search?q=%22https:%2F%2Fid.biodiversity.org.au%2Fnode%2Fapni%2F2896715%22. Need to escape https in the same way as http. https://github.com/AtlasOfLivingAustralia/biocache-service/pull/864

I have seen examples where the wrapping is expected by a user that expects wrapping of the entire search. e.g. https://biocache-ws.ala.org.au/ws/occurrences/search?q=New%20South%20Wales

In the biocache-service I do not recommend we wrap, in double quotes, individual words or the entire search term. Instead I suggest that if there is confusion on how the search works we address it in the UI. If wrapping is required do it in the UI.

peggynewman commented 7 months ago

Ok to test with a current LSID:

Error In prod: https://biocache.ala.org.au/ws/occurrences/search?q="https://biodiversity.org.au/afd/taxa/808f7f5b-f6cd-4677-848a-49315e7babe3"

Fixed In test: https://biocache-ws-test.ala.org.au/ws/occurrences/search?q="https://biodiversity.org.au/afd/taxa/808f7f5b-f6cd-4677-848a-49315e7babe3"