Smile-SA / elasticsuite

Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
https://elasticsuite.io
Open Software License 3.0
761 stars 341 forks source link

Introduce a stemmer override step in the analysers #1742

Closed rbayet closed 4 years ago

rbayet commented 4 years ago

Is your feature request related to a problem? Please describe. Sometimes the automatically selected language specific stemmer has some edge issues, either failing to detect singular/plural form (seen with italian) or assigning the same root to words that, although sharing an etymologic ancestry, are now quite distinct (seen in french on an old Solr project).

Using the thesaurus to fix those issues is a bad idea, because it denatures its intended usage which is not to fix analysis issues, but apply business specific synonyms and expansions.

Describe the solution you'd like Introduce a basic support for language/locale dependant stemmer_override token filter in the standard analyser chain. At the moment, supporting the simpler "rules" parameters should be enough.

Additional context Problem historically seen with old versions (Solr 3.x based project) of the french and _lightfrench stemmer which would reduce both collectivités (collectivities) and collections to collect. Which was annoying considering the user was mainly selling books and almost all of them had a "collection" attribute in the format "Collection [XYZ]" : searching for "droit des collectivités" ("Collectivities law") would return all books.

Problem also seen recently with an italian user selling among other things power tools : the italian stemmer would reduce trapano and trapani (drill and drills) to different stem. => Searching for "drills" would not return fewer products than search from "drill".

walkwizus commented 4 years ago

Hi all,

Since this update I've an error with indexer process.

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"stemmer override filter requires eitherrulesorrules_pathto be configured"}],"type":"illegal_argument_exception","reason":"stemmer override filter requires eitherrulesorrules_pathto be configured"},"status":400}

I tried to add the following configuration but without success:

`

[]

`

When I remove all line of associated commit indexer process work fine.

Any idea ?

Thanks !