Currently the words variable for BuildExcerpts is built from words returned by Query command. These words are already stemmed and accompanied with some statistical information like docs and hits. Highlighting fails for some queries.
For example try recovery keyword, it's not highlighted since Query returns the stemmed version (recoveri) (I think this is the result of the soundex stemmer).
Anyways when you use a query string as a words variable for BuildExcerpts all's fine with highlighting. BuildExcerpts do the stemming and we don't use statistics on highlighting so I don't see the problem of using the initial query string for words.
I think this is subject to fix the line 569 of djangosphinx/models.py:
words = ' '.join([w['word'] for w in results['words']])
Currently the
words
variable for BuildExcerpts is built from words returned by Query command. These words are already stemmed and accompanied with some statistical information like docs and hits. Highlighting fails for some queries.For example try recovery keyword, it's not highlighted since Query returns the stemmed version (recoveri) (I think this is the result of the soundex stemmer).
Anyways when you use a query string as a
words
variable for BuildExcerpts all's fine with highlighting. BuildExcerpts do the stemming and we don't use statistics on highlighting so I don't see the problem of using the initial query string forwords
.I think this is subject to fix the line 569 of
djangosphinx/models.py
:Replace this to:
Thoughts?