Smile-SA / elasticsuite

Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
https://elasticsuite.io
Open Software License 3.0
763 stars 342 forks source link

Unable to index categories, ES 6.5.3, ElasticSuite 2.7.1 #1236

Closed siimm closed 5 years ago

siimm commented 5 years ago

Preconditions

Magento Version :2.3

ElasticSuite Version :2.7.1 ElasticSearch Version: 6.5.3

Environment : Developer/Production, does not matter

Third party modules :

Steps to reproduce

  1. Reindex elasticsuite_categories_fulltext php bin/magento i:reindex elasticsuite_categories_fulltext

Expected result

  1. Categories indexed in Elasticsearch

Actual result

  1. No categories indexed, ElasticSearch error message: Failed to parse value [0] as only [true] or [false] are allowed

Similar issue from the past: https://github.com/Smile-SA/elasticsuite/issues/807

There the resolution was that 6.X ES is not supported. At the moment I can see 2 types of information, releases page claims that 6.X is supported from 2.6.0 version of ElasicSuite: https://github.com/Smile-SA/elasticsuite/releases

Module install page, which claims that 6.X is not yet supported: https://github.com/Smile-SA/elasticsuite/wiki/ModuleInstall

Is it actually supported?

For anybody ending up here with a similar problem, here is a patch to map boolean types as integers, after this it will work. https://gist.github.com/siimm/654193cc410f176cd1d56625ed3da2ce

Probably reasonable approach would be to cast all attribute values to Boolean which claim to be boolean by backend type.

rbayet commented 5 years ago

Hi @siimm,

Yes, https://github.com/Smile-SA/elasticsuite/wiki/ModuleInstall should be updated, as the server install wiki page (https://github.com/Smile-SA/elasticsuite/wiki/ServerConfig-6.x) pushes to an 6.x installation.

On a Open Source/CE 2.3.0 + Luma with ElasticSuite 2.7.1, I could not reproduce you issue with a 6.5.3 ElasticSearch server (Debian).

On what system are you ? Do you have custom category boolean attributes ? Can you provide us what gives curl http://your.server:9200 as below ?

{
  "name" : "5-TOoqX",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "PRAbgW1QQTawEKlN1mC_RQ",
  "version" : {
    "number" : "6.5.3",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "159a78a",
    "build_date" : "2018-12-06T20:11:28.826501Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Regards,

siimm commented 5 years ago

Hi,

We are running Magento Commerce 2.3 on Ubuntu The problem appeared with the default category attribute 'is_active'.. probably others would follow if only this one was fixed.

It can be that this is something our setup specific, I haven't yet had a chance to try this on a fresh installation of Magento 2.3. But I will try to do that

Anyways, output from ES server is as follows:

{
  "name" : "nodename_censored",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "e_TBDNdSRxGx1TkFkZbBiA",
  "version" : {
    "number" : "6.5.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "9434bed",
    "build_date" : "2018-11-29T23:58:20.891072Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

Kind Regards Siim

rbayet commented 5 years ago

Hello @siimm,

Thanks for the extra information. Did you migrate your project to Magento 2.3 ? If so, from which version of Magento 2 ?

Can you also provide us with the result of composer info | grep elastic ?

Regards,

siimm commented 5 years ago

Hi, We came from version 2.2.5 The output from from composer info|grep elastic is as follows:

elasticsearch/elasticsearch                                 v5.3.2                                                  PHP Client for Elasticsearch
magento/module-elasticsearch                                100.3.0                                                 N/A
smile/elasticsuite                                          2.7.1                                                   Magento 2 merchandising and search engine built on ElasticSearch

I will try to test with a clean installation today, maybe I am barking at the wrong three here :)

Regards, Siim

Swahjak commented 5 years ago

We are having the same issue. Doing some further investigation.

Swahjak commented 5 years ago

Could this be due tot the fact that is_active is not converted / retrieved as a boolean value?

The error:

[2019-01-03 14:32:52] main.ERROR: Bulk index operation failed 1000 times in index magento2_toppynl_catalog_category_20190103_143250 for type category. Error (mapper_parsing_exception) : failed to parse [is_active]. Failed doc ids sample : 3, 47, 50, 67, 69, 70, 75, 76, 77, 89. [] []

The input:

[851] =>
  array(23) {
    'entity_id' =>
    string(3) "851"
    'attribute_set_id' =>
    string(2) "12"
    'parent_id' =>
    string(3) "850"
    'created_at' =>
    string(19) "2012-01-28 09:09:55"
    'updated_at' =>
    string(19) "2019-01-01 18:53:58"
    'path' =>
    string(16) "1/2224/3/850/851"
    'position' =>
    string(1) "2"
    'level' =>
    string(1) "4"
    'children_count' =>
    string(1) "3"
    'is_active' =>
    string(1) "1"
    'description' =>
    array(1) {
      [0] =>
      string(6236) "<a name="verschillende-filtermedia"></a>
<h4>Filterzand, filterglas, filterparels of Aqualoon?</h4>
<p>Bij het gebruik van een zandfilter is het filtermedium de doorslaggevende factor in hoe schoon en helder je zwembadwater wordt en blijft. Lees hieronder de voor en nadelen van <strong>filterzand</strong>, <strong>filterglas</strong>, <strong>filterparels</strong> en <strong>Aqualoon</strong>, wanneer je het best voor welk medium kunt kiezen en hoeveel je nodig hebt.</p>

<hr>

<h3><strong>Filterzand<"...
    }
    'meta_keywords' =>
    array(1) {
      [0] =>
      string(46) "filterzand, filterglas, filterparels, aqualoon"
    }
    'display_mode' =>
    array(1) {
      [0] =>
      string(8) "PRODUCTS"
    }
    'option_text_display_mode' =>
    array(1) {
      [0] =>
      string(13) "Products only"
    }
    'name' =>
    array(1) {
      [0] =>
      string(28) "Filtermedia voor zandfilters"
    }
    'url_key' =>
    array(1) {
      [0] =>
      string(42) "filterzand-en-andere-media-voor-zandfilter"
    }
    'url_path' =>
    array(1) {
      [0] =>
      string(77) "category/851/zwembad/zwembadfilter/filterzand-en-andere-media-voor-zandfilter"
    }
    'meta_title' =>
    array(1) {
      [0] =>
      string(47) "Filterzand, -glas en meer voor in je zandfilter"
    }
    'include_in_menu' =>
    array(1) {
      [0] =>
      bool(true)
    }
    'option_text_include_in_menu' =>
    array(1) {
      [0] =>
      string(26) "Include in Navigation Menu"
    }
    'option_text_is_active' =>
    array(1) {
      [0] =>
      string(9) "Is Active"
    }
    'is_anchor' =>
    array(1) {
      [0] =>
      bool(true)
    }
    'option_text_is_anchor' =>
    array(1) {
      [0] =>
      string(9) "Is Anchor"
    }
  }

The mapping:

{  
   "aliases":{  
      "magento2_toppynl_catalog_category":{  

      }
   },
   "mappings":{  
      "category":{  
         "_all":{  
            "enabled":false
         },
         "properties":{  
            "all_children":{  
               "type":"keyword"
            },
            "attribute_set_id":{  
               "type":"integer"
            },
            "autocomplete":{  
               "type":"text",
               "fields":{  
                  "shingle":{  
                     "type":"text",
                     "analyzer":"shingle"
                  },
                  "whitespace":{  
                     "type":"text",
                     "analyzer":"whitespace"
                  }
               },
               "analyzer":"standard"
            },
            "children":{  
               "type":"keyword"
            },
            "children_count":{  
               "type":"integer"
            },
            "created_at":{  
               "type":"date",
               "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
            },
            "custom_design":{  
               "type":"keyword"
            },
            "custom_design_from":{  
               "type":"date",
               "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
            },
            "custom_design_to":{  
               "type":"date",
               "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
            },
            "description":{  
               "type":"keyword"
            },
            "display_mode":{  
               "type":"keyword"
            },
            "entity_id":{  
               "type":"integer"
            },
            "include_in_menu":{  
               "type":"boolean"
            },
            "is_active":{  
               "type":"boolean"
            },
            "is_anchor":{  
               "type":"boolean"
            },
            "landing_page":{  
               "type":"integer"
            },
            "level":{  
               "type":"integer"
            },
            "meta_description":{  
               "type":"keyword"
            },
            "meta_keywords":{  
               "type":"keyword"
            },
            "meta_title":{  
               "type":"keyword"
            },
            "name":{  
               "type":"keyword"
            },
            "option_text_custom_design":{  
               "type":"keyword"
            },
            "option_text_display_mode":{  
               "type":"keyword"
            },
            "option_text_include_in_menu":{  
               "type":"keyword"
            },
            "option_text_is_active":{  
               "type":"keyword"
            },
            "option_text_is_anchor":{  
               "type":"keyword"
            },
            "option_text_landing_page":{  
               "type":"keyword"
            },
            "option_text_page_layout":{  
               "type":"keyword"
            },
            "page_layout":{  
               "type":"keyword"
            },
            "parent_id":{  
               "type":"integer"
            },
            "path":{  
               "type":"keyword"
            },
            "path_in_store":{  
               "type":"keyword"
            },
            "position":{  
               "type":"integer"
            },
            "search":{  
               "type":"text",
               "fields":{  
                  "shingle":{  
                     "type":"text",
                     "analyzer":"shingle"
                  },
                  "whitespace":{  
                     "type":"text",
                     "analyzer":"whitespace"
                  }
               },
               "analyzer":"standard"
            },
            "spelling":{  
               "type":"text",
               "fields":{  
                  "phonetic":{  
                     "type":"text",
                     "analyzer":"phonetic"
                  },
                  "shingle":{  
                     "type":"text",
                     "analyzer":"shingle"
                  },
                  "whitespace":{  
                     "type":"text",
                     "analyzer":"whitespace"
                  }
               },
               "analyzer":"standard"
            },
            "updated_at":{  
               "type":"date",
               "format":"yyyy-MM-dd HH:mm:ss||yyyy-MM-dd"
            },
            "url_key":{  
               "type":"keyword"
            },
            "url_path":{  
               "type":"keyword"
            }
         }
      }
   },
   "settings":{  
      "index":{  
         "mapping":{  
            "total_fields":{  
               "limit":"20000"
            }
         },
         "refresh_interval":"1s",
         "translog":{  
            "durability":"request"
         },
         "provided_name":"magento2_toppynl_catalog_category_20190103_143250",
         "max_result_window":"100000",
         "creation_date":"1546525970305",
         "requests":{  
            "cache":{  
               "enable":"true"
            }
         },
         "analysis":{  
            "filter":{  
               "standard":{  
                  "type":"stemmer",
                  "language":"dutch"
               },
               "phonetic":{  
                  "type":"phonetic",
                  "encoder":"metaphone"
               },
               "lowercase":{  
                  "type":"lowercase"
               },
               "trim":{  
                  "type":"trim"
               },
               "reference_word_delimiter":{  
                  "split_on_numerics":"true",
                  "generate_word_parts":"true",
                  "preserve_original":"false",
                  "catenate_words":"false",
                  "catenate_all":"false",
                  "split_on_case_change":"true",
                  "type":"word_delimiter",
                  "catenate_numbers":"false"
               },
               "ascii_folding":{  
                  "type":"asciifolding",
                  "preserve_original":"false"
               },
               "shingle":{  
                  "max_shingle_size":"2",
                  "min_shingle_size":"2",
                  "output_unigrams":"true",
                  "type":"shingle"
               },
               "reference_shingle":{  
                  "max_shingle_size":"10",
                  "min_shingle_size":"2",
                  "token_separator":"",
                  "output_unigrams":"true",
                  "type":"shingle"
               },
               "word_delimiter":{  
                  "split_on_numerics":"true",
                  "generate_word_parts":"true",
                  "preserve_original":"true",
                  "catenate_words":"true",
                  "catenate_all":"true",
                  "split_on_case_change":"true",
                  "type":"word_delimiter",
                  "catenate_numbers":"true"
               }
            },
            "analyzer":{  
               "reference":{  
                  "filter":[  
                     "ascii_folding",
                     "trim",
                     "reference_word_delimiter",
                     "lowercase",
                     "reference_shingle"
                  ],
                  "char_filter":[  
                     "html_strip"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "standard":{  
                  "filter":[  
                     "ascii_folding",
                     "trim",
                     "word_delimiter",
                     "lowercase",
                     "standard"
                  ],
                  "char_filter":[  
                     "html_strip"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "shingle":{  
                  "filter":[  
                     "ascii_folding",
                     "trim",
                     "word_delimiter",
                     "lowercase",
                     "shingle"
                  ],
                  "char_filter":[  
                     "html_strip"
                  ],
                  "type":"custom",
                  "tokenizer":"whitespace"
               },
               "phonetic":{  
                  "filter":[  
                     "ascii_folding",
                     "trim",
                     "word_delimiter",
                     "lowercase",
                     "phonetic"
                  ],
                  "char_filter":[  
                     "html_strip"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               },
               "sortable":{  
                  "filter":[  
                     "ascii_folding",
                     "trim",
                     "lowercase"
                  ],
                  "char_filter":[  
                     "html_strip"
                  ],
                  "type":"custom",
                  "tokenizer":"keyword"
               },
               "whitespace":{  
                  "filter":[  
                     "ascii_folding",
                     "trim",
                     "word_delimiter",
                     "lowercase"
                  ],
                  "char_filter":[  
                     "html_strip"
                  ],
                  "type":"custom",
                  "tokenizer":"standard"
               }
            },
            "char_filter":{  
               "html_strip":{  
                  "type":"html_strip"
               }
            }
         },
         "number_of_replicas":"0",
         "uuid":"4-Nlok77S3KYZaKVrqSuiA",
         "version":{  
            "created":"6040299"
         },
         "codec":"best_compression",
         "number_of_shards":"1",
         "merge":{  
            "scheduler":{  
               "max_thread_count":"1"
            }
         }
      }
   }
}
Swahjak commented 5 years ago

Ok, so basically this seems to be rooted in the fact that the initial category collection is loaded with is_active but it's not mapped. Then it's loaded again with the attribute data source, but that does not overwrite the original content.

The loading of is_active in the initial collection in triggered right here https://github.com/Smile-SA/elasticsuite/blob/2.7.x/src/module-elasticsuite-catalog/Model/ResourceModel/Category/Indexer/Fulltext/Action/Full.php#L70. The $categoryCollection->addIsActiveFilter(); causes the is_active column to be loaded.

The not overwriting part is right here https://github.com/Smile-SA/elasticsuite/blob/2.7.x/src/module-elasticsuite-catalog/Model/Category/Indexer/Fulltext/Datasource/AttributeData.php#L50. This could be easily fixed by changing the line to $indexData[$productId] = array_replace($indexData[$productId], $indexValues);.

@rbayet could you tell what the preferred fix would be in this case? Wouldn't mind creating a pull, but this could be either elasticsuite_indices.xml or a change as suggested above. (Although adding is_active to elasticsuite_indices.xml does not seem to fix it, but I could be doing something wrong).

romainruaud commented 5 years ago

well, it took me a long time to understand this one.

Seems like you set many category attributes to be searchable, didn't you ?

I'm able to reproduce it if I manually set the is_active attribute to is_searchable=1 on BDD (which is not the case on a legacy install).

I agree the attached fix in PR should fix this.

romainruaud commented 5 years ago

Associated PR has been merged and will be part of next 2.7.x minor release.

Regards