AtlasOfLivingAustralia / bie-index

Taxonomic search services
https://bie-ws.ala.org.au/ws
Other
1 stars 17 forks source link

Genus searches not returning genus as first result #245

Closed nickdos closed 10 months ago

nickdos commented 5 years ago

E.g. acacia and eucalyptus searches are returning species as first results (new version).

image

Autocomplete appears to be doing the right thing:

image

charvolant commented 5 years ago

For some reason, the solr ranking system is scoring Acacia lower than Acacia paradoxa:

For acacia

" 72.54088 = boost(text:acacia,double(searchWeight)), product of:   6.045073 = weight(text:acacia in 4160) [SchemaSimilarity], result of:     6.045073 = score(doc=4160,freq=2.0 = termFreq=2.0 ), product of:       5.9527106 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:         12908.0 = docFreq         4967125.0 = docCount       1.015516 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:         2.0 = termFreq=2.0         1.2 = parameter k1         0.75 = parameter b         7.0839205 = avgFieldLength         16.0 = fieldLength   12.0 = double(searchWeight)=12.0 "

For Acacia paradoxa

" 79.566345 = boost(text:acacia,double(searchWeight)), product of:   7.3672543 = weight(text:acacia in 39124) [SchemaSimilarity], result of:     7.3672543 = score(doc=39124,freq=3.0 = termFreq=3.0 ), product of:       5.9527106 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:         12908.0 = docFreq         4967125.0 = docCount       1.2376301 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:         3.0 = termFreq=3.0         1.2 = parameter k1         0.75 = parameter b         7.0839205 = avgFieldLength         16.0 = fieldLength   10.8 = double(searchWeight)=10.8 "

Probably because termFreq contains more entries.

Euclatyptus shows a different pattern, when the fieldLength is larger for the genus.

charvolant commented 5 years ago

The suggest will always return an exact match first.

nickdos commented 5 years ago

Can you see the text that is contributing to the termFreq? I've seen a similar thing where the common name contains "acacia" (which is the case with Acacia paradoxa - common name: "Kangaroo Acacia") and thus it gets a higher score. I'm wondering if we can ignore termFreq for this particular field?

Edit: omitTermFreqAndPositions seems to be the option.

charvolant commented 5 years ago

The other way to do it is to downgrade common names vs scientific names, so qf=scientificName^2.0+commonName^1.5+text or such-like. ATM the search is just on text, which covers everything.

adam-collins commented 10 months ago

working today https://bie.ala.org.au/ws/search?q=acacia