Closed javier-molina closed 9 months ago
@djtfmartin
occurrences/search?taxa=m13929
and occurrences/search?taxa=M13929
are converted to
q=text:"m13929"
and q=text:"M13929"
In solr config
<field name="text" type="textgen" multiValued="true" indexed="true" stored="false" />
will have a lower case filter applied that's why it's case insensitive.
catalogNumber
is just a normal String field so it's case-sensitive.
Question is whether we need to facet on catalogNumber
. If yes then it has to stay a string
type field but if no, then can be converted to a text
based field (we'd want to use a derivative that did not do stemming etc. but did apply a case insensitive filter.
Also, if faceting is required then it could be copied into a text
copyTo field for this use-case.
SOLR managed-schema additions required
<field name="text_catalogNumber" type="textgen" multiValued="true" indexed="true" stored="false" />
<copyField source="catalogNumber" dest="text_catalogNumber"/>
biocache-hubs search field change required for catalog_number
to text_catalogNumber
pipelines pull request https://github.com/gbif/pipelines/pull/1001
in version 2.18.0-SNAPSHOT
@adam-collins could you please clear up what should we be testing here? Searching catalogNumber is still case sensitive, but we're happy for that. What did the change do? I'm not sure whether there is a requirement to facet on catalogNumber.
In prod:
The example in this issue: catalogue_number:m13929 returns nothing catalogue_number:M13929 returns 3 results ... all valid
perth example: catalogue_number:perth 9639314 catalogue_number:PERTH 9639314 both return 2 matches: one for 9639314 (a birdlife record) and one for "PERTH 9639314" from WA herbarium
mel example: catalogue_number:MEL%202526538A search returns over 100m records catalogue_number:%22MEL%202526538A%22 in quotes, returns the correct result catalogue_number:%22mel%202526538A%22 without quotes, returns nothing
In test: The example in this issue: catalogue_number:m13929 returns nothing catalogue_number:M13929 returns 3 results ... all valid
perth example: catalogue_number:perth 9639314 catalogue_number:PERTH 9639314 both return 2 matches: one for 9639314 (a birdlife record) and one for "PERTH 9639314" from WA herbarium
mel example: catalogue_number:MEL%202526538A search returns over 100m records catalogue_number:%22MEL%202526538A%22 in quotes, returns the correct result catalogue_number:%22mel%202526538A%22 without quotes, returns nothing
Please clarify
catalogue_number
is the pre-pipelines name. Use catalogNumber
instead.text_catalogNumber
for case insensitive search. This is consistent with the other text_*
fields e.g. https://biocache-test.ala.org.au/fields?filter=text_
Catalogue number
field here https://biocache-test.ala.org.au/search#tab_advanceSearchtext_catalogNumber
is new and for case insensitive searching. If you want a specific match use double quotes.
The example in this issue: text_catalogNumber:"m13929" 3 records catalogNumber:m13929 returns 0 results catalogNumber:M13929 returns 3 results
perth example: Above example is missing from test.
mel example behaves the same as the example in this issue. e.g. Use double quotes, catalogNumber is case sensitive, text_catalogNumber is case insensitive.
Ok, this is fine, happy to go ahead
Background Search for catalog number is case sensitive however catalog number is added to text which is case insensitive.
https://biocache.ala.org.au/occurrences/search?q=catalogue_number%3Am13929 - No results (case sensitive search)
https://biocache.ala.org.au/occurrences/search?q=catalogue_number%3AM13929 - Gets results after correct capitalisation is used
https://biocache.ala.org.au/occurrences/search?taxa=m13929 or https://biocache.ala.org.au/occurrences/search?taxa=M13929 gets the same results, Not case sensitive search.
Requirement Enable case insensitive search for catalog number.
Related to https://support.ehelp.edu.au/a/tickets/112910