Closed Aram-GSM closed 5 years ago
Hi @Aram-GSM,
Analysis chain is configured using the store view configured locale. If you website is configured with en_US, the analysis will be the english (en) one. Then, we will load the english analyzer(s) from elasticsuite_analysis.xml file.
Now if you need persian (fa), you have to :
A good way to figure out how to configure the analysis would be to use this doc : https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-lang-analyzer.html#persian-analyzer. You should look at the arabic language which should be quite similar.
By default, ElasticSearch does not shipped any hebrew language analyzer but it is available as plugins.
Feel free to sublit a PR if you want the persian language to be added in the core ElasticSearch setup.
I am having a similar issue and followed the instructions from @afoucret to extend the elasticsuite_analysis.xml
so that it add the needed "stem_greek" but I cannot see the new filter being added to the index created (by reindexing). Any chance for more specific instructions on how to go about doing this? BTW, re-indexing does not seem to create new indexes so I suspect I will need to delete the old indices or something to get new ones (and hopefully with new settings?)
Also, do I need to upgrade or do something else for Magento and ElasticSuite to pick up the updated xml?
The stemmer I wish to introduce is https://github.com/skroutz/elasticsearch-skroutz-greekstemmer
Update: Even after deleting an index and trigger a new index to be created wiht php magento index:reindex catalogsearch_fulltext
the new index still does not have the new settings in the analyzer which confirms that ElasticSuite is not picking up my changes. BTW, the ElasticSearch engine does work with the stemmer (tested locally).
Update 2: Turns out the xml is loaded only once (the first time) so a magento setup:upgrade
is needed if you change things in here. After making this change the new index on the server has the needed stemmer!
Here is my elasticsuite_analysis.xml
:
<?xml version="1.0"?>
<!--
/**
* Smile_ElasticsuiteCore default analysis configuration.
*
* DISCLAIMER
*
* Do not edit or add to this file if you wish to upgrade Smile ElasticSuite to newer
* versions in the future.
*
*
* @category Smile
* @package Smile\ElasticsuiteCore
* @author Aurelien FOUCRET <aurelien.foucret@smile.fr>
* @copyright 2018 Smile
* @license Open Software License ("OSL") v. 3.0
*/
-->
<analysis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="urn:magento:module:Smile_ElasticsuiteCore:etc/elasticsuite_analysis.xsd">
<char_filters>
<char_filter name="html_strip" type="html_strip" language="default"/>
</char_filters>
<filters>
<filter name="trim" type="trim" language="default"/>
<filter name="lowercase" type="lowercase" language="default"/>
<filter name="word_delimiter" type="word_delimiter" language="default">
<generate_word_parts>true</generate_word_parts>
<catenate_words>true</catenate_words>
<catenate_numbers>true</catenate_numbers>
<catenate_all>true</catenate_all>
<split_on_case_change>true</split_on_case_change>
<split_on_numerics>true</split_on_numerics>
<preserve_original>true</preserve_original>
</filter>
<filter name="shingle" type="shingle" language="default">
<min_shingle_size>2</min_shingle_size>
<max_shingle_size>2</max_shingle_size>
<output_unigrams>true</output_unigrams>
</filter>
<filter name="reference_shingle" type="shingle" language="default">
<min_shingle_size>2</min_shingle_size>
<max_shingle_size>10</max_shingle_size>
<output_unigrams>true</output_unigrams>
<token_separator></token_separator>
</filter>
<filter name="reference_word_delimiter" type="word_delimiter" language="default">
<generate_word_parts>true</generate_word_parts>
<catenate_words>false</catenate_words>
<catenate_numbers>false</catenate_numbers>
<catenate_all>false</catenate_all>
<split_on_case_change>true</split_on_case_change>
<split_on_numerics>true</split_on_numerics>
<preserve_original>false</preserve_original>
</filter>
<filter name="ascii_folding" type="asciifolding" language="default">
<preserve_original>false</preserve_original>
</filter>
<filter name="standard" type="standard" language="default" />
<filter name="standard" type="stemmer" language="ar">
<language>arabic</language>
</filter>
<filter name="standard" type="stemmer" language="eu">
<language>basque</language>
</filter>
<filter name="standard" type="stemmer" language="bg">
<language>bulgarian</language>
</filter>
<filter name="standard" type="stemmer" language="ca">
<language>catalan</language>
</filter>
<filter name="standard" type="stemmer" language="cs">
<language>czech</language>
</filter>
<filter name="standard" type="stemmer" language="da">
<language>danish</language>
</filter>
<filter name="standard" type="stemmer" language="de">
<language>german2</language>
</filter>
<filter name="standard" type="stemmer" language="en">
<language>english</language>
</filter>
<filter name="standard" type="stemmer" language="es">
<language>spanish</language>
</filter>
<filter name="standard" type="stemmer" language="el">
<language>greek</language>
</filter>
<filter name="standard" type="stemmer" language="fi">
<language>finnish</language>
</filter>
<filter name="standard" type="stemmer" language="fr">
<language>french</language>
</filter>
<filter name="standard" type="stemmer" language="gl">
<language>galician</language>
</filter>
<filter name="standard" type="stemmer" language="hi">
<language>hindi</language>
</filter>
<filter name="standard" type="stemmer" language="hu">
<language>hungarian</language>
</filter>
<filter name="standard" type="stemmer" language="id">
<language>indonesian</language>
</filter>
<filter name="standard" type="stemmer" language="it">
<language>italian</language>
</filter>
<filter name="standard" type="stemmer" language="lv">
<language>latvian</language>
</filter>
<filter name="standard" type="stemmer" language="lt">
<language>lithuanian</language>
</filter>
<filter name="standard" type="stemmer" language="nb">
<language>norwegian</language>
</filter>
<filter name="standard" type="stemmer" language="nn">
<language>light_nynorsk</language>
</filter>
<filter name="standard" type="stemmer" language="nl">
<language>dutch</language>
</filter>
<filter name="standard" type="stemmer" language="pt">
<language>portuguese</language>
</filter>
<filter name="standard" type="stemmer" language="ro">
<language>romanian</language>
</filter>
<filter name="standard" type="stemmer" language="ru">
<language>russian</language>
</filter>
<filter name="standard" type="stemmer" language="sv">
<language>swedish</language>
</filter>
<filter name="standard" type="stemmer" language="tr">
<language>turkish</language>
</filter>
<filter name="elision" type="elision" language="ca">
<articles>["d", "l", "m", "n", "s", "t"]</articles>
</filter>
<filter name="elision" type="elision" language="fr">
<articles>["l", "m", "t", "qu", "n", "s","j", "d", "c"]</articles>
</filter>
<filter name="elision" type="elision" language="it">
<articles>["c", "l", "all", "dall", "dell","nell", "sull", "coll", "pell","gl", "agl", "dagl", "degl", "negl","sugl", "un", "m", "t", "s", "v", "d"]</articles>
</filter>
<filter name="phonetic" type="phonetic" language="default">
<encoder>metaphone</encoder>
</filter>
<filter name="phonetic" type="phonetic" language="fr">
<encoder>beider_morse</encoder>
<languageset>french</languageset>
</filter>
<filter name="lower_greek" type="lowercase" language="el">
<language>greek</language>
</filter>
<filter name="stem_greek" type="skroutz_stem_greek" language="el">
<language>greek</language>
</filter>
</filters>
<analyzers>
<analyzer name="standard" tokenizer="standard" language="default">
<filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="word_delimiter" />
<filter ref="lowercase" />
<filter ref="elision" />
<filter ref="standard" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
<analyzer name="whitespace" tokenizer="standard" language="default">
<filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="word_delimiter" />
<filter ref="lowercase" />
<filter ref="elision" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
<analyzer name="reference" tokenizer="standard" language="default">
<filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="reference_word_delimiter" />
<filter ref="lowercase" />
<filter ref="elision" />
<filter ref="reference_shingle" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
<analyzer name="shingle" tokenizer="whitespace" language="default">
<filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="word_delimiter" />
<filter ref="lowercase" />
<filter ref="elision" />
<filter ref="shingle" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
<analyzer name="sortable" tokenizer="keyword" language="default">
<filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="lowercase" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
<analyzer name="phonetic" tokenizer="standard" language="default">
<filters>
<filter ref="ascii_folding" />
<filter ref="trim" />
<filter ref="word_delimiter" />
<filter ref="lowercase" />
<filter ref="elision" />
<filter ref="phonetic" />
</filters>
<char_filters>
<char_filter ref="html_strip" />
</char_filters>
</analyzer>
<!-- Following was based on https://www.elastic.co/guide/en/elasticsearch/reference/2.4/analysis-lang-analyzer.html#greek-analyzer -->
<!--<analyzer name="greek" tokenizer="standard" language="el">-->
<!--<filters>-->
<!--<filter ref="greek_lowercase" />-->
<!--<filter ref="greek_stop" />-->
<!--<filter ref="greek_keywords" />-->
<!--<filter ref="greek_stemmer" />-->
<!--<filter ref="stem_greek" />-->
<!--</filters>-->
<!--</analyzer>-->
<analyzer name="stem_analyzer" tokenizer="standard" language="el">
<filters>
<filter ref="lower_greek" />
<filter ref="stem_greek" />
</filters>
</analyzer>
</analyzers>
</analysis>
@georgios-2317 so if I understand properly, you managed to get it working with greek ?
So ... this one is stalled for too long, and it's actually unclear if we have to append additional answers.
I close it,
Regards
Hi, If type characters not including English characters or numbers, nothing happened in search box of filters (facet search) (search action not triggered) That means we can't search in Persian, Hebrew or other non-English languages.
Magento Version : CE 2.2.4
ElasticSuite Version : 2.6.0
Environment : Developer
Expected result
Actual result