Smile-SA / elasticsuite

Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
https://elasticsuite.io
Open Software License 3.0
761 stars 341 forks source link

Fulltext search is not working #3308

Closed haristariqmage4 closed 3 months ago

haristariqmage4 commented 4 months ago

Preconditions

Magento Version : 2.4.0

ElasticSuite Version : 2.1.0

Environment : Developer

Third party modules : Amasty_Base Amasty_CronScheduleList Amasty_Customform Amasty_InvisibleCaptcha Amasty_RequestQuote Amasty_QuoteAttributesManagement Amasty_RequestAQuoteProSubscriptionPackage Amasty_QuoteAttributes Amazon_Core Amazon_Login Amazon_Payment Clarion_CustomerAttribute Codazon_AjaxCartPro Codazon_AjaxLayeredNav Codazon_AjaxLayeredNavPro Codazon_Core Codazon_GoogleAmpManager Codazon_ImproveBundle Codazon_Lookbookpro Codazon_MegaMenu Codazon_OneStepCheckout Codazon_ProductFilter Codazon_ThemeOptions Codazon_QuickShop Codazon_ShippingCostCalculator Codazon_Shopbybrandpro Codazon_Slideshow Codazon_ProductLabel Codazon_Utility Dotdigitalgroup_Email Dotdigitalgroup_ChatHarrigo_EverCrumbs Klarna_Core Klarna_Ordermanagement Klarna_Onsitemessaging Klarna_Kp Klaviyo_Reclaim MageMe_HidePrice MageWorx_SearchSuiteAutocomplete Magefan_Community Magefan_Blog Magefan_WysiwygAdvanced Magemonkeys_CategoryFilter Magemonkeys_CompanyName Magemonkeys_Customerinfo Magemonkeys_FeaturedProduct Magemonkeys_HideMyOrders Magemonkeys_Ordermail Magemonkeys_Product Magemonkeys_Quote Magemonkeys_RemoveQuoteCartPrice Magemonkeys_RepresentativeAttr Magemonkeys_RestrictCategory Magemonkeys_WelcomeEmailCc Mageplaza_Core Mageplaza_BannerSlider Mageplaza_BackendReindex Mageplaza_MassProductActions Mageplaza_Smtp Magestat_SplitOrder OlegKoval_RegenerateUrlRewrites PayPal_Braintree PayPal_BraintreeGraphQl RapideWeb_ProductListTable Smile_ElasticsuiteCore Smile_ElasticsuiteCatalog Smile_ElasticsuiteCatalogGraphQl Smile_ElasticsuiteCatalogRule Smile_ElasticsuiteCatalogOptimizer Smile_ElasticsuiteTracker Smile_ElasticsuiteThesaurus Smile_ElasticsuiteSwatches Smile_ElasticsuiteIndices Smile_ElasticsuiteAnalytics Smile_ElasticsuiteVirtualCategory Temando_ShippingRemover Ulmod_Ordernotes Vertex_Tax Vertex_AddressValidation WebShopApps_MatrixRate Yotpo_Yotpo Zero1_Patches

How do we make results for "dextrose 5% water" show the same as results for "d5w"? Since one is multiple words and the other is technically just one? How can we make sure that items like Sharps Container 26 1/4 ° 20 w * 14 3/4 D Inch 19 BD Gallon are not included in the search results for 'D5W'?

Expected result

More narrow product search that will only allow for exact terms to be fetched Searches like 'D5W' should not have results that include hits for 'D' '5' 'W'

Actual result

image

rbayet commented 4 months ago

Hello @haristariqmage4,

This is probably due to the "word_delimiter" of the "standard" (text) analyzer which will transform your product names before indexing it. This "word_delimiter" component DO split words like "D5W" when switching from a letter to a digit and vice versa, so you are correct assuming that we do search for "D", "5" and "W" when searching for "D5W". The issue is then that you have other product names with those isolated letters (for example coming from a product name string like "3/5 H X 10 7/10 W X 6 D"). You can check what's happening on the analyzer side of things from the admin interface in the Elasticsuite > System > Analysis.

If you can't have "simpler" product names, I would recommend trying to change the configuration of the "word_delimiter" token filter by disabling "split_on_numerics" in the elasticsuite_analysis.xml (through a composer patch or a re-definition of the XML in a custom module)

Then for the original issue, a thesaurus entry for associating "D5W" or "d5w" to "Dextrose 5% Water" should to the trick.

Regards,

haristariqmage4 commented 4 months ago

image There is only System > Indices. Please define more.

rbayet commented 4 months ago

Hello @haristariqmage4,

Indeed, I've just saw

Magento Version : 2.4.0

ElasticSuite Version : 2.1.0 (I guess it's 2.10.0)

That's ... old :) Indeed, you will not have that screen which has been introduced in 2.10.13, so in Magento 2.4.1 and above only. You can install cerebro locally and reproduce what that screen does from cerebro's "analysis" screen : image

Or use directly the _analyze endpoint of your Elasticsearch in CLI

/var/www/html $ curl -H "Content-Type: application/json" -XPOST http://opensearch:9200/magento2_fr_fr_catalog_product/_analyze?pretty -d '{"analyzer":"standard","text":"d5w"}'
{
  "tokens" : [
    {
      "token" : "d5w",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "d",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "d5w",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "5",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "w",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 2
    }
  ]
}

(Replace http://opensearch:9200/magento2_fr_fr_catalog_product/ by http://[your_elasticsearch_server_address_or_hostname]/[your_catalog_product_index_name])

Regards,

haristariqmage4 commented 4 months ago

image Hi @rbayet , Hereis my analysis, now what should i do?

rbayet commented 4 months ago

Hello @haristariqmage4,

So now that your thesaurus is in place, you have two options (that could be combined, actually)

  1. reducing the score penalty for products matching a synonym
  2. altering the "word_delimiter" token filter in the way I described

1. reducing the score penalty for products matching a synonym When searching for "d5w" you will now also search for "dextrose 5% water" but by default the products matching only "dextrose 5% water" will suffer a score penalty with a tenth of their expected score.

You can change that by reducing (up to 1, ie "no penalty") the setting available at Elasticsuite > Search Relevance > Thesaurus Configuration > Synonyms Configuration > Synonyms Weight Divider image

The products matching individually "D", "5" and "W" will still be present in the search results list but maybe at a lower place for you to be satisfied.

2. altering the "word_delimiter" token filter in the way I described If you're not satisfied, or as an alternative, you can redefine or finetune the word_delimiter token filter which is responsible for splitting "D5W" into "D", "5" and "W".

You probably only need to change the "split_on_numerics" from "true" to "false".

You can do that either with a composer patch on that distribution file OR create a custom module in app/code with a local elasticsuite_analysis.xml which will contain just the re-defined word_delimiter token filter. In both cases, this will require clearing the Magento cache and performing a full reindex.

Please be aware that this approach could have adverse side effects, for instance preventing finding products with a "L48B" in their name or their SKU by searching for "L 48 B" for instance.

Regards,

haristariqmage4 commented 4 months ago

@rbayet
Should i generate_word_parts -> false too in elasticsuite_analysis.

haristariqmage4 commented 4 months ago

Also what can be the solution to avoid:

Please be aware that this approach could have adverse side effects, for instance preventing finding products with a "L48B" in their name or their SKU by searching for "L 48 B" for instance.

this side effects

haristariqmage4 commented 3 months ago

@rbayet ?