Smile-SA / elasticsuite

Smile ElasticSuite - Magento 2 merchandising and search engine built on ElasticSearch
https://elasticsuite.io
Open Software License 3.0
761 stars 340 forks source link

2.6 & ES 6.5.3 position sorting not working without drag/drop interface #1238

Closed justinharris1986 closed 5 years ago

justinharris1986 commented 5 years ago

Preconditions

Magento Version : 2.2.6 CE

ElasticSuite Version : Not sure where the version is, but assuming 2.6.4. Installed via command: composer require smile/elasticsuite ~2.6.0 and is currently updated:

$ composer upgrade
Loading composer repositories with package information
Updating dependencies (including require-dev)
Nothing to install or update
Generating autoload files

Environment : production - but happens in developer too Third party modules :

List of enabled modules:
Bss_FastOrder
MSP_ReCaptcha
MageVision_UpdateOrderEmailAddress
Bss_Paymentshipping
Fooman_EmailAttachments
Fooman_PdfCore
Fooman_PrintOrderPdf
Fooman_PdfCustomiser
Ess_M2ePro
Cadence_Fbpixel
MSP_TwoFactorAuth
Dotdigitalgroup_Email
Mageplaza_Core
Mageplaza_AbandonedCart
Mageplaza_DeliveryTime
Mageplaza_Osc
Mageplaza_Seo
RocketWeb_ShoppingFeeds
RocketWeb_ShoppingFeedsGoogle
Smile_ElasticsuiteCore
Smile_ElasticsuiteCatalog
Smile_ElasticsuiteThesaurus
Swissup_Core
Swissup_Lightboxpro
Swissup_SubscriptionChecker
Ves_All
Ves_Megamenu
Ves_Setup

List of disabled modules:
Amazon_Core
Amazon_Login
Bss_AjaxCart
Klarna_Core
Klarna_Ordermanagement
Amazon_Payment
Klarna_Kp
Mageplaza_GiftCard
Smile_ElasticsuiteCatalogRule
Smile_ElasticsuiteCatalogOptimizer
Smile_ElasticsuiteSwatches
Smile_ElasticsuiteTracker
Smile_ElasticsuiteVirtualCategory
Temando_Shipping
Vertex_Tax

ElasticSearch version:

{
  "name" : "f9SsUjH",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "e8Y8JCSVSdmaQmTDbG5scg",
  "version" : {
    "number" : "6.5.3",
    "build_flavor" : "default",
    "build_type" : "rpm",
    "build_hash" : "159a78a",
    "build_date" : "2018-12-06T20:11:28.826501Z",
    "build_snapshot" : false,
    "lucene_version" : "7.5.0",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

other misc stuff:

cat /etc/redhat-release 
CentOS Linux release 7.6.1810 (Core)
uname -r
3.10.0-957.1.3.el7.x86_64
php -v
PHP 7.1.25 (cli) (built: Dec  8 2018 13:52:58) ( NTS )
Copyright (c) 1997-2018 The PHP Group
Zend Engine v3.1.0, Copyright (c) 1998-2018 Zend Technologies
mysql --version 
mysql  Ver 15.1 Distrib 10.3.11-MariaDB, for Linux (x86_64) using readline 5.1

Steps to reproduce

1. 2. 3.

Expected result

  1. This is using the built in position sorting in magento, not the smile elasticsuite drag and drop sorting. We migrated several hundred categories with sorting over, and do not wish to re-sort everything with drag and drop. Therefore some Smile_ElasticSuite* modules are disabled (see above)
  2. In the backend, Navigate to Catalog -> Products. Enter at least 3 products.
  3. In the backend, navigate to Catalog -> Categories, Create a category under default, put all 3 products in both. manually sort the position in one direction
  4. In the backend, navigate to Catalog -> Categories, Create a second category under default, put all 3 products in both. manually sort the position in the opposite direction
  5. run all indexers, verify up to date
  6. go to the front end, each category should match the position you placed them in, in steps 2 and 3.

Actual result

  1. You will find that they are both sorted the same way, typically the way with the highest category ID number.

sorry for my long winded information below, but I did a lot of digging =) All the values below are from our dev environment:

Here is our catalog_category_product_index_store1 table, when looking at one product id:

MariaDB [pv_dev]> select * from catalog_category_product_index_store1 where product_id = 60845;
+-------------+------------+----------+-----------+----------+------------+
| category_id | product_id | position | is_parent | store_id | visibility |
+-------------+------------+----------+-----------+----------+------------+
|           2 |      60845 |    10000 |         0 |        1 |          4 |
|           3 |      60845 |    10000 |         0 |        1 |          4 |
|           4 |      60845 |     9977 |         0 |        1 |          4 |
|           7 |      60845 |        0 |         1 |        1 |          4 |
|          13 |      60845 |     9977 |         0 |        1 |          4 |
|         423 |      60845 |    10000 |         0 |        1 |          4 |
|         427 |      60845 |        0 |         1 |        1 |          4 |
|         658 |      60845 |     -299 |         1 |        1 |          4 |
|         667 |      60845 |      -23 |         1 |        1 |          4 |
|         900 |      60845 |     9908 |         0 |        1 |          4 |
|         902 |      60845 |      -92 |         1 |        1 |          4 |
|         987 |      60845 |     -199 |         1 |        1 |          4 | <<< Take note of this -199 position, notice it is the one with the highest category_id
+-------------+------------+----------+-----------+----------+------------+
12 rows in set (0.000 sec)

When we navigate to the category in the front end that pertains to category_id 667, we find that the product is not sorted properly. I see it as the first product in the front end, even though it is sorted in the back end as roughly the 110th:

select * from catalog_category_product_index_store1 where category_id = 667 ORDER BY position ASC LIMIT 0,115;
+-------------+------------+----------+-----------+----------+------------+
| category_id | product_id | position | is_parent | store_id | visibility |
+-------------+------------+----------+-----------+----------+------------+
|         667 |     163054 |     -129 |         1 |        1 |          4 |
|         667 |     324920 |     -128 |         1 |        1 |          4 |
|         667 |     324921 |     -127 |         1 |        1 |          4 |
|         667 |     163057 |     -126 |         1 |        1 |          4 |
|         667 |     324922 |     -125 |         1 |        1 |          4 |
===Output truncated for brevity===
|         667 |     324996 |      -29 |         1 |        1 |          4 |
|         667 |     324994 |      -28 |         1 |        1 |          4 |
|         667 |     324995 |      -27 |         1 |        1 |          4 |
|         667 |      60829 |      -26 |         1 |        1 |          4 |
|         667 |     311602 |      -25 |         1 |        1 |          4 |
|         667 |     292931 |      -24 |         1 |        1 |          4 |
|         667 |      60845 |      -23 |         1 |        1 |          4 | <<<Here is the one we are working with
|         667 |     311603 |      -22 |         1 |        1 |          4 |
|         667 |     311604 |      -21 |         1 |        1 |          4 |
|         667 |     311605 |      -20 |         1 |        1 |          4 |
|         667 |      60846 |      -19 |         1 |        1 |          4 |
+-------------+------------+----------+-----------+----------+------------+
115 rows in set (0.001 sec)

so now, if we turn our attention to elasticsearch, this is the query that was picked up by wireshark:

{
  "size": 144,
  "sort": [
    {
      "category.position": {
        "order": "asc",
        "missing": "_last",
        "unmapped_type": "keyword",
        "nested_path": "category",
        "mode": "min"
      }
    },
    {
      "_score": {
        "order": "desc"
      }
    },
    {
      "entity_id": {
        "order": "desc",
        "missing": "_first",
        "unmapped_type": "keyword"
      }
    }
  ],
===remainder of output omitted for brevity===

This is the result I get for my product (as picked up from wireshark). Again I chopped out most of this rather large response. I also had to redact some identifiable information we cannot have out there.

{
  "took": 247,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 196,
    "max_score": null,
    "hits": [
      {
        "_index": "magento2_default_catalog_product_20181216_231007",
        "_type": "product",
        "_id": "60845",
        "_score": 1,
        "_source": {
          "entity_id": "60845",
          "attribute_set_id": "4",
          "type_id": "simple",
          "sku": "REDACTED",
          "has_options": false,
          "required_options": false,
          "created_at": "2016-03-18 23:00:47",
          "updated_at": "2018-11-25 20:07:14",
          "visibility": "4",
          "price": [
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "0"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "1"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": false,
              "customer_group_id": "2"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": false,
              "customer_group_id": "3"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "4"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "6"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "7"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "8"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "9"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "10"
            },
            {
              "price": "REDACTED",
              "original_price": "REDACTED",
              "is_discount": true,
              "customer_group_id": "11"
            }
          ],
          "indexed_attributes": [
            "price",
            "name",
            "manufacturer",
            "status",
            "tax_class_id"
          ],
          "category": [
            {
              "category_id": 2,
              "position": 10000,
              "store_id": "1",
              "visibility": "4"
            },
            {
              "category_id": 3,
              "position": 10000,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 4,
              "position": 9977,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 7,
              "is_parent": true,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 13,
              "position": 9977,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 423,
              "position": 10000,
              "store_id": "1",
              "visibility": "4"
            },
            {
              "category_id": 427,
              "is_parent": true,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 658,
              "position": -299,
              "is_parent": true,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 667,   <<<MY NOTE - this is the category I am viewing in the front end
              "position": -23,   <<<< with the proper position
              "is_parent": true,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED
            },
            {
              "category_id": 900,
              "position": 9908,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 902,
              "position": -92,
              "is_parent": true,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            },
            {
              "category_id": 987,<<< However it is this category that is getting the sorted position
              "position": -199, 
              "is_parent": true,
              "store_id": "1",
              "visibility": "4",
              "name": "REDACTED"
            }
          ],
          "name": [
            "REDACTED12"
          ],
          "manufacturer_part_number": [
            "REDACTED"
          ],
          "legacy_part_number_1": [
            "REDACTED"
          ],
          "legacy_part_number_2": [
            "REDACTED"
          ],
          "description": [
            "REDACTED"
          ],
          "short_description": [
            "REDACTED"
          ],
          "manufacturer": [
            2506
          ],
          "status": [
            1
          ],
          "option_text_status": [
            "Enabled"
          ],
          "tax_class_id": [
            2
          ],
          "option_text_tax_class_id": [
            "Taxable Goods"
          ],
          "stock": {
            "is_in_stock": true,
            "qty": 9999209
          }
        },
        "sort": [
          -299, <<<< and is thus landing here in the sort field
          1,
          60845
        ]
      },
===remainder of output omitted for brevity===

If i manually change the category position to the following, while keeping the remainder of the query the same (including the omitted portion) and I fire it with cURL, I get everything sorted in the proper direction:

     "category.position": {
        "order": "asc",
        "missing": "_last",
        "unmapped_type": "keyword",
        "nested": {
            "path": "category",
            "filter": {
                "term": { "category.category_id": "667" }
            },
        },
        "mode": "min"
      }

I wont muddy up the report with another json blob, but know that the sort field at the bottom turns to this:

        "sort": [
          -23,
          1,
          60845
        ]
      },

Try as I might, I couldn't find the code that would update this nested path/filter stuff properly. I did find a bug that came pretty close but dosen't feel right: #881

Thoughts? Ideas? Suggestions?

rbayet commented 5 years ago

Hello @justinharris1986,

Nice digging ! To know your exact smile/elasticsuite version, you can look it up in the composer.lock file or simply view it via composer info | grep -i elastic : you will see the versions of all elasticsearch related modules. Given your composer require command, I'll assume you're on a 2.6.4, please contradict me if need be.

Onto your issue : Yes, if you disable Smile_ElasticsuiteVirtualCategory, you end up keeping the native manual positioning of products in categories (while losing the functionality of virtual categories).

An alternative would have been to migrate (by SQL) your data into the specific table smile_virtualcategory_catalog_category_product_position as mentionned in issue #1198 (especially https://github.com/Smile-SA/elasticsuite/issues/1198#issuecomment-442498038). But if you have/do a lot of manual positioning, and negative ones, this might not be the preferred approach.

I'll try look into it, it may indeed be related to #881 or anyway an improper application of search/navigation context.

Regards,

justinharris1986 commented 5 years ago

nice! I didn't know about the composer info command - I have been manually rooting around in composer.json files

composer info | grep -i elastic
elasticsearch/elasticsearch                       v5.3.2    PHP Client for Elasticsearch
smile/elasticsuite                                2.6.4     Magento 2 merchandising and search engine built on ElasticSearch

Let me look in to the issue you mentioned about migrating the data over to the virtual category thing. they do a LOT of manual sorting on HUGE categories (product wise), so I am not 100% sure this is the best route, but I will give it a try. Let me know if you find anything on your end as well.

rbayet commented 5 years ago

Hello @justinharris1986,

I've tried to reproduce your issue on a 2.2.6 CE (with Luma, and disabling the ElasticSuite modules your disabled), to no avail : the generated ES queries bear the correct nester filter on the sort order :

  "size": 9,
  "sort": [
    {
      "category.position": {
        "order": "asc",
        "missing": "_last",
        "unmapped_type": "keyword",
        "nested_path": "category",
        "mode": "min",
        "nested_filter": {
          "terms": {
            "category.category_id": [
              "43"
            ],
            "boost": 1
          }
        }
      }
    },

It is applied to the collection by \Smile\ElasticsuiteCatalog\Plugin\LayerPlugin. In catalog navigation context, it's at line 93 :

        if (!$searchQuery->getQueryText() && $layer->getCurrentCategory()) {
            $categoryId = $layer->getCurrentCategory()->getId();
            $sortFilter = ['category.category_id' => $categoryId];
            $collection->addSortFilterParameters('position', 'category.position', 'category', $sortFilter);

When the query is built, the path is :

\Smile\ElasticsuiteCore\Search\Request\SortOrder\SortOrderBuilder::buildSordOrders is probably where your problem is located

  1. either the incorrect factory is selected (it should switch from the standardOrderFactory to the nestedOrderFactory, line 87)
  2. or the nester (query terms) filter is not applied (lines 91 to 96)
                if (isset($sortOrderParams['nestedPath'])) {
                    $factory = $this->nestedOrderFactory;
                }

                if (isset($sortOrderParams['nestedFilter'])) {
                    $nestedFilter = $this->queryBuilder->create(
                        $containerConfig,
                        $sortOrderParams['nestedFilter'],
                        $sortOrderParams['nestedPath']
                    );
                    $sortOrderParams['nestedFilter'] = $nestedFilter;
                }

Considering you have nested_path and mode in your sort order, and looking at \Smile\ElasticsuiteCore\Search\Adapter\Elasticsuite\Request\SortOrder\Builder::buildSortOrder then

I hope the different code locations will help you pinpoint where and why it occurs. If you find it, I'm interested in knowing about it (for instance to know of a third party module incompatibility).

Regards, Richard.

justinharris1986 commented 5 years ago

So - I have already disabled all the modules and dropped to the Luma theme, Still had the same issue. Taking a look at this code: \Smile\ElasticsuiteCatalog\Plugin\LayerPlugin line 93

  if (!$searchQuery->getQueryText() && $layer->getCurrentCategory()) {
            $categoryId = $layer->getCurrentCategory()->getId();
            $sortFilter = ['category.category_id' => $categoryId];
            $collection->addSortFilterParameters('position', 'category.position', 'category', $sortFilter);

$searchQuery->getQueryText() is returning a value: string(1) " " and the entire $searchQuery->debug():

array(12) {
  | ["query_id"]=>
  | string(5) "18911"
  | ["query_text"]=>
  | string(1) " "
  | ["num_results"]=>
  | string(6) "183273"
  | ["popularity"]=>
  | string(1) "1"
  | ["store_id"]=>
  | string(1) "1"
  | ["display_in_terms"]=>
  | string(1) "1"
  | ["is_active"]=>
  | string(1) "1"
  | ["is_processed"]=>
  | string(1) "0"
  | ["updated_at"]=>
  | string(19) "2018-12-13 17:43:49"
  | ["is_spellchecked"]=>
  | string(1) "0"
  | ["is_query_text_exceeded"]=>
  | bool(false)
  | ["is_query_text_short"]=>
  | bool(true)
  | }

Now for reasons I cannot explain, there is a search for a space just chillin' out in mysql,

MariaDB [pv_dev]> select * from search_query where query_id=18911;
+----------+------------+-------------+------------+----------+----------+------------------+-----------+--------------+---------------------+-----------------+
| query_id | query_text | num_results | popularity | redirect | store_id | display_in_terms | is_active | is_processed | updated_at          | is_spellchecked |
+----------+------------+-------------+------------+----------+----------+------------------+-----------+--------------+---------------------+-----------------+
|    18911 |            |      183273 |          1 | NULL     |        1 |                1 |         1 |            0 | 2018-12-13 17:43:49 |               0 |
+----------+------------+-------------+------------+----------+----------+------------------+-----------+--------------+---------------------+-----------------+

Based on the updated_at this query came in after our migration from magento 1.9 ->2.2.6, so this isn't a migration issue. (but knowing how bad the data was in 1.9 I wouldn't be surprised if it came over with a bad date too) I tried firing off a query at: https://dev.blahblahblah.com/catalogsearch/result/?q=%20 and it correctly rejected it. Im going to keep an eye on it, because I have a feeling it is going to creep back up, but somewhere in the magento code, it was incorrectly returning this search.

justinharris1986 commented 5 years ago

I went to go blow out the offending record in production, and I realized that the time is newer, so it definitely happened again. Server logs aren't much help. This particular customer got an autosuggest list, and then somehow wound up at https://www.blahblahblah.com/catalogsearch/result/?q= Navigating there manually didn't trigger the space.

@rbayet I would be curious to see if manually pushing this in your database triggers it on your end:

INSERT INTO search_query (query_text, num_results, popularity, store_id, display_in_terms, is_active, is_processed, updated_at, is_spellchecked)
VALUES (" ", 1, 999, 1, 1, 1, 0, "2018-12-13 17:43:49", 0);

(Then restart varnish fully, from the command line, if needed) Once I popped that in, restarted varnish the issue came up on my end.

I still need to figure out how these spaces are getting in there, as that is the root of the problem.

rbayet commented 5 years ago

Hello @justinharris1986,

OK, I'll try that tomorrow and see if that generates the issue, and if so, why it "leaks" there in a catalog navigation context.

Regards,

rbayet commented 5 years ago

Hello @justinharris1986,

I can confirm that by inserting that empty query, I can reproduce your issue. It's actually pretty simple : in \Magento\Search\Model\QueryFactory::get the Magento\Search module tries to load an pre-existing query via \Magento\Search\Model\ResourceModel\Query::loadByQueryText which uses the following query :

        $select = $this->getConnection()->select()->from(
            $this->getMainTable()
        )->where(
            'query_text = ?',
            $value
        )->where(
            'store_id = ?',
            $object->getStoreId()
        )->limit(
            1
        );

The problem is that query_text being a varchar, '""' and " " are considered equals because the empty string is first padded with spaces to make it the same length as " " (see explanations in following stackoverflow posts https://stackoverflow.com/a/34782656 and https://stackoverflow.com/a/14703615). Using LIKE on the other hand would avoid the problem.

mysql> select '' = ' ';
+----------+
| '' = ' ' |
+----------+
|        1 |
+----------+
1 row in set (0.00 sec)

mysql> select '' LIKE ' ';
+-------------+
| '' LIKE ' ' |
+-------------+
|           0 |
+-------------+
1 row in set (0.00 sec)

Another way around the issue would be, of course, not having that record in the database in the first place, by preventing saving to DB such a query. Have you found where it might be added from ?

PS: do we agree that your catalog/search/min_query_length is still set to 1 ?

Regards, Richard.

justinharris1986 commented 5 years ago

Good afternoon @rbayet ! Interesting. Good info to know! Yes my catalog/search/min_query_length is still set to 1.

the 2 times this query popped in I was not able to re-create it from the logs. the first user was actually me in testing - I was searching for something with no space in it, and the 2nd user had some weird autosuggest issue which appears to be unrelated to this.

I pulled that offending record out a little over 52 hours ago to start with a clean slate (and get this working again for my customer). I have been checking to see if it has popped in periodically, but no one appears to have tripped it. I really want to know what it is so I can stop it from happening, but at the same time, if its fixed...

I'll keep an eye on it for a week or two, If it pops in again ill try to piece it together otherwise, not sure what else can be done to get to the root of this.

Justin

justinharris1986 commented 5 years ago

So I have not seen it pop in for the last 3 weeks. Also I noticed that I was able to search for spaces in my old Magento 1 install, and that entry in the database did copy over. I am just going to call it "bad data migrated from magento 1 to magento 2".

For anyone reading this thread, TL;DR: DELETE * FROM search_query WHERE query_text = " ";