blakearchive / archive

GNU General Public License v2.0
5 stars 7 forks source link

searching with exact string is returning results it shouldn't #486

Open ghost opened 7 years ago

ghost commented 7 years ago

@nathan-rice search "the angel" with the quotes click on second result of Transcription row notice the first matching object should not be a result (it has 'angel' in it but not "the angel")

this is actually also an issue on production, so it's not a result of the refactoring

ghost commented 7 years ago

"white as an angel" seems to work fine, though. it returns only results with that full string. so it seems this has something to do with exact strings that begin with stopwords. i made it so that stopwords aren't searched unless they're in quotes (see removeStopWords()), but for some reason "the angel" returns results that contain "the angel" as well as results that contain only contain 'angel'

ghost commented 7 years ago

maybe removeStopWords() along with the solr stopword functionality is overkill and we can eliminate the solr stopword functionality

ghost commented 7 years ago

tried removing the stopwordfilter, but that didn't help. actually, with it removed, "the angel" returned no results. seems like odd behavior because without the stopwordfilter, stopwords should get indexed

nathan-rice commented 7 years ago

You can't query stopwords (even in quoted queries) when the stopwords filter is enabled. Note that searching for "The angel" returns the exact same results as searching for angel.

Did you restart the solr service after changing the schema.xml? If not, your changes would not have been fully applied.

ghost commented 7 years ago

ok, but "white as an angel" returns correct results and that has two stopwords in it. the problem seems to arise only when the stopword comes first.

yes, i'm pretty sure the solr service was restarted. the deploy script takes care of that.

nathan-rice commented 7 years ago

No idea then, that is contrary to solr's documentation. Probably worth a visit to stack overflow.