Open kaladay opened 1 year ago
All the df
does is prepend the specified field onto each word.
For example, with a search of "red apple" and a df
of title
, we get:
q=title:red apple
There are problems with this and we might need to have sow
enabled.
With sow=true
, we instead get:
q=title:red title:apple
The wildcards also introduce a problem.
Wildcards are not expanded the way in which we think.
The search of "red apple" actually searches for (when sow
is false):
MatchAllDocsQuery(*:*) q=title:red apple MatchAllDocsQuery(*:*)
.
This looks to me like it pulls in other fields.Using df
is a step forward, but sow
needs to be used.
When not using df
, the default appears to be _text_
which is where we copy everything into for the all_fields
matches.
There is also this important documentation note:
NOTE: If you want to be able to sort on a field whose contents you want to tokenize to facilitate searching, use a copyField directive in the the Schema to clone the field. Then search on the field and sort on its clone.
I strongly suspect that the rest of the problems are in how we structure the solr core and use the properties.
see: https://solr.apache.org/guide/7_7/the-standard-query-parser.html
All the fields in the Metadata Application Profile (http://oaktrust.library.tamu.edu/handle/1969.1/175368) and the new ones that we have accumulated will need to have exact-match facets, tokenizations, and search fields - potentially achieved with copy-fields.
Describe the bug The search logic seems confusing and wrong. Searching for words by themselves either don't work at all or work depending on things like the operand
and
and the operandor
. For example, searching forapple
may often not work. Some of the work-arounds would be to search* apple
or* +apple
.Not only that but, seemingly randomly, searches end up included results that are clearly not in the selected field.
It has been discovered that using
q
for searching and prepending the field likeq=title:apple
to be a likely part of the problem. The propertydf
(default field) is likely the cause of the seemingly random unrelated results.The search may be improved by using
df
andq
like this example:q=apple&df=title
.It may be possible to still use
*:apple
inq
.This needs to be investigated and a solution needs to be provided.
Solving this may solve #514 because that issue may be a symptom of the problem observed in this issue.
To Reproduce Steps to reproduce the behavior:
Expected behavior Searching should make sense. A search for
apple
should find matches for apple if they exist and should not find matches whereapple
does not exist.