Open jaimeiniesta opened 1 year ago
Hey @jaimeiniesta,
Yeah, you can pass a custom list of transformer modules when adding a field: https://github.com/elixir-haystack/haystack/blob/55e8b1f7ae83b67998a6437ea4a25314849c9cf2/lib/haystack/index/field.ex#L46
So you could either pass your own implementation of stop words, or remove it completely. And you can do that on a per-field basis.
Again this needs to be added to the documentation 😅
Ah, that's cool then. I'll wait for that documentation. Thanks! 😎
Is it possible to customize the stop words used, so I can provide a different list other than the default one or disable stop words?
Context: I'm setting up Haystack for the search in https://rocketvalidator.com/html-validation - currently it just uses a simple search by substring but I want to use Haystack instead. So far it's going great!
During the integration, I found that the results were not as expected in many searches, and it looks like it was due because most of the titles include characters like double quotes:
https://rocketvalidator.com/html-validation/a-link-element-must-not-appear-as-a-descendant-of-a-body-element-unless-the-link-element-has-an-itemprop-attribute-or-has-a-rel-attribute-whose-value-contains-dns-prefetch-modulepreload-pingback-preconnect-prefetch-preload-prerender-or-stylesheet
So when I searched for something containing double quotes, these guides would appear first as they scored higher because they have many double quotes.
I guess this could be solved by adding the double quotes (and other characters like parenthesis, brackets,
<
and>
, etc.) to the stop words. My workaround was to clean up the strings, both during the load and the search:After that, I found that a search for
must not appear
like this https://rocketvalidator.com/html-validation?search=must+not+appear provided no results using Haystack, and that's because these are all stop words.Finally, nor non-English content it would be great to be able to customize the stop words.