Smile-SA / elasticsuite

Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
https://elasticsuite.io
Open Software License 3.0
760 stars 339 forks source link

AWS ElasticSearch Compatibility #379

Closed kh-badep closed 7 years ago

kh-badep commented 7 years ago

Hello,

Is it possible to use the AWS ElasticSearch with Elasticsuite since they have implemented ICU and phonetic plugins. I have tried and while it connects fine, my catalog becomes empty. No error is thrown AFAIK, just empty results. When I connect Elasticsuite to another EC2 instance where I manually installed Elasticsearch everything runs fine.

Preconditions

Magento Version :Magento CE 2.1.5

ElasticSuite Version : dev-master with composer

Environment : Production

Third party modules : None relevant

Steps to reproduce

  1. Configure Elasticsuite with an ElasticSearch Instance
  2. Reindex
  3. Visit Any Category

Expected result

  1. Category must have products

Actual result

  1. Empty Category
romainruaud commented 7 years ago

Hello @metamasterplay ,

first of all, if there is no error during reindex process, it's that all required plugins are properly installed on your Elasticsearch instance. So this is a good point.

Are you able to see your Elasticsearch indexes data ? Eg with plugins such as head or kopf if they are supported, or just by using a cURL request to your ES cluster.

kh-badep commented 7 years ago

This is the result I get from AWS ElasticSearch:

health status index pri rep docs.count docs.deleted store.size pri.store.size green open khs_safya_fr_catalog_category_20170409_162921 1 0 0 0 15.5kb 15.5kb green open khs_emotion_fr_catalog_product_20170409_162919 1 0 0 0 196.6kb 196.6kb green open khs_safya_fr_thesaurus_20170409_162921 1 0 0 0 159b 159b green open khs_safya_fr_catalog_product_20170409_162920 1 0 0 0 140.1kb 140.1kb green open khs_emotion_fr_catalog_category_20170409_162921 1 0 0 0 24kb 24kb green open khs_emotion_fr_thesaurus_20170409_162921 1 0 0 0 159b 159b

This is the result I get from my manually installed ElasticSearch:

health status index pri rep docs.count docs.deleted store.size pri.store.size green open khs_emotion_fr_catalog_category_20170409_160025 1 0 66 0 92.2kb 92.2kb green open khs_emotion_fr_thesaurus_20170409_160025 1 0 0 0 159b 159b green open khs_safya_fr_catalog_product_20170409_160023 1 0 1382 0 1.9mb 1.9mb green open khs_safya_fr_thesaurus_20170409_160025 1 0 0 0 159b 159b green open khs_safya_fr_catalog_category_20170409_160025 1 0 28 0 45.5kb 45.5kb green open khs_emotion_fr_catalog_product_20170409_160021 1 0 2284 0 1.4mb 1.4mb

In my AWS ElasticSearch instance, all my indices are there but they are empty (docs.count = 0)

Edit: I found this when enabling my log:

[2017-04-09 16:57:58] main.ERROR: Bulk index operation failed 28 times in index khs_safya_fr_catalog_category_20170409_165758 for type category Error (no_class_def_found_error) : Could not initialize class org.apache.commons.codec.language.bm.Lang Failed doc ids sample : 115, 116, 117, 118, 119, 120, 121, 122, 123, 124 [] [] [2017-04-09 16:57:58] main.ERROR: Bulk index operation failed 223 times in index khs_safya_fr_catalog_product_20170409_165757 for type product Error (no_class_def_found_error) : Could not initialize class org.apache.commons.codec.language.bm.Lang Failed doc ids sample : 5523, 5524, 5525, 5526, 5527, 5528, 5529, 5530, 5531, 5532 [] []

romainruaud commented 7 years ago

The org.apache.commons.codec.language.bm.Lang is used by the phonetic plugin.

Are you sure the analysis-phonetic plugin is properly installed in your instance ? Did you restart Elasticsearch service after installing the plugin ?

kh-badep commented 7 years ago

The error happens only in AWS ElasticSearch which is a ready-to-use service provided by Amazon. That means I don't have much control over it. But per their announcement, the Phonetic Plugin is supported: https://aws.amazon.com/about-aws/whats-new/2016/12/amazon-elasticsearch-service-now-supports-phonetic-analysis/

Is there a way to check if the Phonetic Plugin is indeed installed via curl or kibana?

romainruaud commented 7 years ago

You can check if the plugin is installed via cURL by running this :

curl -XGET 'http://localhost:9200/_nodes/_all/plugins?pretty'

What I obtain on a working environment is something like this :

{
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "lgWx3dV0TKmPquqnLqMvrA" : {
      "name" : "Eric Slaughter",
      "transport_address" : "10.0.3.10:9300",
      "host" : "10.0.3.10",
      "ip" : "10.0.3.10",
      "version" : "2.2.0",
      "build" : "8ff36d1",
      "http_address" : "10.0.3.10:9200",
      "plugins" : [ {
        "name" : "analysis-icu",
        "version" : "2.2.0",
        "description" : "The ICU Analysis plugin integrates Lucene ICU module into elasticsearch, adding ICU relates analysis components.",
        "jvm" : true,
        "classname" : "org.elasticsearch.plugin.analysis.icu.AnalysisICUPlugin",
        "isolated" : true,
        "site" : false
      }, {
        "name" : "analysis-phonetic",
        "version" : "2.2.0",
        "description" : "The Phonetic Analysis plugin integrates phonetic token filter analysis with elasticsearch.",
        "jvm" : true,
        "classname" : "org.elasticsearch.plugin.analysis.AnalysisPhoneticPlugin",
        "isolated" : true,
        "site" : false
      }, {
        "name" : "head",
        "version" : "master",
        "description" : "head - A web front end for an elastic search cluster",
        "url" : "/_plugin/head/",
        "jvm" : false,
        "site" : true
      }, {
        "name" : "kopf",
        "version" : "2.0.1",
        "description" : "kopf - simple web administration tool for Elasticsearch",
        "url" : "/_plugin/kopf/",
        "jvm" : false,
        "site" : true
      } ],
      "modules" : [ {
        "name" : "lang-expression",
        "version" : "2.2.0",
        "description" : "Lucene expressions integration for Elasticsearch",
        "jvm" : true,
        "classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
        "isolated" : true,
        "site" : false
      }, {
        "name" : "lang-groovy",
        "version" : "2.2.0",
        "description" : "Groovy scripting integration for Elasticsearch",
        "jvm" : true,
        "classname" : "org.elasticsearch.script.groovy.GroovyPlugin",
        "isolated" : true,
        "site" : false
      } ]
    }
  }
}
kh-badep commented 7 years ago

Running curl -XGET /_nodes/_all/plugins?pretty

{"cluster_name":"374685909843:khs-es","nodes":{"R4SvsqZmQym8TLWHkrwBzg":{"name":"Left-Winger","version":"2.3.2","build":"72aa801","modules":[{"name":"lang-expression","version":"2.3.2","description":"Lucene expressions integration for Elasticsearch","jvm":true,"classname":"org.elasticsearch.script.expression.ExpressionPlugin","isolated":true,"site":false},{"name":"lang-groovy","version":"2.3.2","description":"Groovy scripting integration for Elasticsearch","jvm":true,"classname":"org.elasticsearch.script.groovy.GroovyPlugin","isolated":true,"site":false},{"name":"reindex","version":"2.3.2","description":"_reindex and _update_by_query APIs","jvm":true,"classname":"org.elasticsearch.index.reindex.ReindexPlugin","isolated":true,"site":false}]}}}

But with curl -XGET /_cat/plugins

Left-Winger analysis-icu        2.3.2 j
Left-Winger analysis-kuromoji   2.3.2 j
Left-Winger analysis-phonetic   2.3.2 j
Left-Winger cloud-aws           2.3.2 j
Left-Winger elasticsearch-jetty 2.2.0 j
Left-Winger kibana              2.3.2 s /_plugin/kibana/

Also curl -XGET /_all/_settings?pretty=1

{
  "command.php" : {
    "settings" : {
      "index" : {
        "cmd" : "%63%64%20%2F%76%61%72%2F%74%6D%70%20%26%26%20%65%63%68%6F%20%2D%6E%65%20%5C%5C%78%33%36%31%30%63%6B%65%72%20%3E%20%36%31%30%63%6B%65%72%2E%74%78%74%20%26%26%20%63%61%74%20%36%31%30%63%6B%65%72%2E%74%78%74",
        "creation_date" : "1491972764631",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "PKMhxfYeRmKMsRR_qCt5eA",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  "khs_safya_fr_thesaurus_20170409_170029" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "creation_date" : "1491757229836",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "shingle" : {
              "token_separator" : "-",
              "type" : "shingle",
              "output_false" : "true"
            }
          },
          "analyzer" : {
            "synonym" : {
              "filter" : [ "lowercase", "shingle" ],
              "tokenizer" : "standard"
            },
            "expansion" : {
              "filter" : [ "lowercase", "shingle" ],
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "zkK06tsZTE6bZZf7eyKc4w",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  "khs_safya_fr_catalog_category_20170409_170029" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "1s",
        "number_of_shards" : "1",
        "translog" : {
          "disable_flush" : "false",
          "durability" : "request"
        },
        "merge" : {
          "scheduler" : {
            "max_thread_count" : "1"
          }
        },
        "creation_date" : "1491757229398",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "standard" : {
              "type" : "stemmer",
              "language" : "french"
            },
            "phonetic" : {
              "languageset" : "french",
              "type" : "phonetic",
              "encoder" : "beider_morse"
            },
            "lowercase" : {
              "type" : "lowercase"
            },
            "trim" : {
              "type" : "trim"
            },
            "ascii_folding" : {
              "type" : "asciifolding",
              "preserve_original" : "0"
            },
            "elision" : {
              "type" : "elision",
              "articles" : [ "l", "m", "t", "qu", "n", "s", "j" ]
            },
            "shingle" : {
              "max_shingle_size" : "2",
              "min_shingle_size" : "2",
              "output_unigrams" : "0",
              "type" : "shingle"
            },
            "word_delimiter" : {
              "split_on_numerics" : "1",
              "generate_word_parts" : "1",
              "preserve_original" : "1",
              "catenate_words" : "1",
              "catenate_all" : "1",
              "split_on_case_change" : "1",
              "type" : "word_delimiter",
              "catenate_numbers" : "1"
            }
          },
          "analyzer" : {
            "standard" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "elision", "standard" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "shingle" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "shingle" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "phonetic" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "phonetic" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "sortable" : {
              "filter" : [ "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "keyword"
            },
            "whitespace" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            }
          },
          "char_filter" : {
            "html_strip" : {
              "type" : "html_strip"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "G9AJHau3SWem1FnzM6TRYQ",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  "khs_emotion_fr_catalog_category_20170409_170029" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "1s",
        "number_of_shards" : "1",
        "translog" : {
          "disable_flush" : "false",
          "durability" : "request"
        },
        "merge" : {
          "scheduler" : {
            "max_thread_count" : "1"
          }
        },
        "creation_date" : "1491757229088",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "standard" : {
              "type" : "stemmer",
              "language" : "french"
            },
            "phonetic" : {
              "languageset" : "french",
              "type" : "phonetic",
              "encoder" : "beider_morse"
            },
            "lowercase" : {
              "type" : "lowercase"
            },
            "trim" : {
              "type" : "trim"
            },
            "ascii_folding" : {
              "type" : "asciifolding",
              "preserve_original" : "0"
            },
            "elision" : {
              "type" : "elision",
              "articles" : [ "l", "m", "t", "qu", "n", "s", "j" ]
            },
            "shingle" : {
              "max_shingle_size" : "2",
              "min_shingle_size" : "2",
              "output_unigrams" : "0",
              "type" : "shingle"
            },
            "word_delimiter" : {
              "split_on_numerics" : "1",
              "generate_word_parts" : "1",
              "preserve_original" : "1",
              "catenate_words" : "1",
              "catenate_all" : "1",
              "split_on_case_change" : "1",
              "type" : "word_delimiter",
              "catenate_numbers" : "1"
            }
          },
          "analyzer" : {
            "standard" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "elision", "standard" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "shingle" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "shingle" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "phonetic" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "phonetic" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "sortable" : {
              "filter" : [ "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "keyword"
            },
            "whitespace" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            }
          },
          "char_filter" : {
            "html_strip" : {
              "type" : "html_strip"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "z3jcwRQ6RhO22h0TnMSG7Q",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  "khs_emotion_fr_thesaurus_20170409_170029" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "creation_date" : "1491757229723",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "shingle" : {
              "token_separator" : "-",
              "type" : "shingle",
              "output_false" : "true"
            }
          },
          "analyzer" : {
            "synonym" : {
              "filter" : [ "lowercase", "shingle" ],
              "tokenizer" : "standard"
            },
            "expansion" : {
              "filter" : [ "lowercase", "shingle" ],
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "cwDAVeCEQV6HbtNqJWLH0g",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  ".kibana-4" : {
    "settings" : {
      "index" : {
        "max_result_window" : "2147483647",
        "creation_date" : "1491756763337",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "246rjUflQnyLGXy5inZrdw",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  "khs_emotion_fr_catalog_product_20170409_170025" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "1s",
        "number_of_shards" : "1",
        "translog" : {
          "disable_flush" : "false",
          "durability" : "request"
        },
        "merge" : {
          "scheduler" : {
            "max_thread_count" : "1"
          }
        },
        "creation_date" : "1491757225908",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "standard" : {
              "type" : "stemmer",
              "language" : "french"
            },
            "phonetic" : {
              "languageset" : "french",
              "type" : "phonetic",
              "encoder" : "beider_morse"
            },
            "lowercase" : {
              "type" : "lowercase"
            },
            "trim" : {
              "type" : "trim"
            },
            "ascii_folding" : {
              "type" : "asciifolding",
              "preserve_original" : "0"
            },
            "elision" : {
              "type" : "elision",
              "articles" : [ "l", "m", "t", "qu", "n", "s", "j" ]
            },
            "shingle" : {
              "max_shingle_size" : "2",
              "min_shingle_size" : "2",
              "output_unigrams" : "0",
              "type" : "shingle"
            },
            "word_delimiter" : {
              "split_on_numerics" : "1",
              "generate_word_parts" : "1",
              "preserve_original" : "1",
              "catenate_words" : "1",
              "catenate_all" : "1",
              "split_on_case_change" : "1",
              "type" : "word_delimiter",
              "catenate_numbers" : "1"
            }
          },
          "analyzer" : {
            "standard" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "elision", "standard" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "shingle" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "shingle" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "phonetic" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "phonetic" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "sortable" : {
              "filter" : [ "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "keyword"
            },
            "whitespace" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            }
          },
          "char_filter" : {
            "html_strip" : {
              "type" : "html_strip"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "5PqQGGlxQaWpJzQkJkm9TA",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  },
  "khs_safya_fr_catalog_product_20170409_170027" : {
    "settings" : {
      "index" : {
        "refresh_interval" : "1s",
        "number_of_shards" : "1",
        "translog" : {
          "disable_flush" : "false",
          "durability" : "request"
        },
        "merge" : {
          "scheduler" : {
            "max_thread_count" : "1"
          }
        },
        "creation_date" : "1491757227912",
        "requests" : {
          "cache" : {
            "enable" : "true"
          }
        },
        "analysis" : {
          "filter" : {
            "standard" : {
              "type" : "stemmer",
              "language" : "french"
            },
            "phonetic" : {
              "languageset" : "french",
              "type" : "phonetic",
              "encoder" : "beider_morse"
            },
            "lowercase" : {
              "type" : "lowercase"
            },
            "trim" : {
              "type" : "trim"
            },
            "ascii_folding" : {
              "type" : "asciifolding",
              "preserve_original" : "0"
            },
            "elision" : {
              "type" : "elision",
              "articles" : [ "l", "m", "t", "qu", "n", "s", "j" ]
            },
            "shingle" : {
              "max_shingle_size" : "2",
              "min_shingle_size" : "2",
              "output_unigrams" : "0",
              "type" : "shingle"
            },
            "word_delimiter" : {
              "split_on_numerics" : "1",
              "generate_word_parts" : "1",
              "preserve_original" : "1",
              "catenate_words" : "1",
              "catenate_all" : "1",
              "split_on_case_change" : "1",
              "type" : "word_delimiter",
              "catenate_numbers" : "1"
            }
          },
          "analyzer" : {
            "standard" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "elision", "standard" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "shingle" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "shingle" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "phonetic" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim", "phonetic" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            },
            "sortable" : {
              "filter" : [ "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "keyword"
            },
            "whitespace" : {
              "filter" : [ "word_delimiter", "lowercase", "ascii_folding", "trim" ],
              "char_filter" : [ "html_strip" ],
              "type" : "custom",
              "tokenizer" : "whitespace"
            }
          },
          "char_filter" : {
            "html_strip" : {
              "type" : "html_strip"
            }
          }
        },
        "number_of_replicas" : "0",
        "uuid" : "7GhLMLNiRSK-4_jTo7DKjg",
        "version" : {
          "created" : "2030299"
        }
      }
    }
  }
}
romainruaud commented 7 years ago

Well that's kinda strange.

I do not know why you are not seeing the plugins via "/_nodes/_all/plugins?pretty" but seeing them via cat...

Are you able to choose the ES version ? I see you are running 2.3, maybe you could try 2.2 or 2.4 which are the versions we usually have running for our projects.

This seems to be something which is configuration and AWS related, maybe @afoucret knows more than me about AWS ?

kh-badep commented 7 years ago

From Amazon:

To use this feature, simply update your Elasticsearch field mappings using the Elasticsearch API to indicate which fields you would like to include for phonetic analysis and the type of analyzer to use. You can then run your queries as you normally do, and Amazon Elasticsearch Service will return results including those that match similar sounding terms, without the need for you to write any custom code.

Only 1.5, 2.3 and 5.1 versions are provided and they all present the same symptoms

afoucret commented 7 years ago

Hi @metamasterplay,

AWS ElastiSearch Service is not fully compatible with a standard ES server and even if you were able to get the phonetic plugin working you will have other problems since some ES operations are not supported : http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-supported-es-operations.html

By the way, for with AWS ES Service 2.3, there is no support for the termvector operation which is used by ElasticSuite to detect spellchecked requests.

Instead, you should use your own ES deployment Another alternative would be to use Elastic Cloud which also rely on the AWS infrastructure and have a quite compelling pricing (https://www.elastic.co/fr/cloud/as-a-service/pricing).

I close the issue since we consider us not responsible of a non standard ES implementation like is AWS one.

BR