Restream / reindexer

Embeddable, in-memory, document-oriented database with a high-level Query builder interface.
https://reindexer.io
Apache License 2.0
763 stars 64 forks source link

"IN" (Set condition) for small namespaces after v2.9.2 #61

Closed oruchreis closed 4 years ago

oruchreis commented 4 years ago

Hi, After v2.9.2 we can't search small namespaces with IN condition but bigger namespaces works expected. This works on small namespaces: Id IN (1) But this doesn't work: Id IN (1,1,1,1) If items count in the namespace is bigger than 50 or something it works expected. I've figured out that if I set kMaxSelectivityPercentForIdset to 100 in indexunorderd.cc, it works like before v2.9.2. I think there is a bug in this condition in indexunorderd.cc at line 192: https://github.com/Restream/reindexer/blob/05757895ae35089133fd96a8ab373520bf4bc03d/cpp_src/core/index/indexunordered.cc#L192

Best Regards.

slowcheetahzzz commented 4 years ago

Hello oruchreis, thanks for reporting the issue. We'll take a look at it in the nearest future.

olegator77 commented 4 years ago

Hi, We can't reproduce this issue with simple data set. Plz provide more details about this issue: Indexes structure, data structure, exact query, query explain ouptut

oruchreis commented 4 years ago

Hi, Sorry I forgot to mention about built environment. I could reproduce this issue with windows build(MSVC) of standalone reindexer server. In linux and osx build I can't reproduce this issue neither. I simply cloned latest master branch to my local and built x64-Debug(I've tried Release also) with Visual Studio 2019. I've only enabled "LINK_RESOURCES" option for face and swagger. Here are the structures: Namespace: Test Indexes:

{
  "total_items": 1,
  "items": [
    {
      "name": "Id",
      "field_type": "int64",
      "index_type": "hash",
      "is_pk": true,
      "is_array": false,
      "is_dense": true,
      "is_sparse": false,
      "collate_mode": "none",
      "sort_order_letters": "",
      "expire_after": 0,
      "config": {},
      "json_paths": [
        "Id"
      ]
    }
  ]
}

Data:

{
  "items": [
    {
      "Id": 1
    },
    {
      "Id": 2
    }
  ],
  "namespaces": [
    "Test"
  ],
  "cache_enabled": true,
  "total_items": 2
}

The query that fails:

SELECT Id FROM Test WHERE Id IN (1,1,1,1)

If I add EXPLAIN to start of the query, the application crashes.

oruchreis commented 4 years ago

I've also tested x64-release build and the setup file from appveyor artifacts (which afaik it is VS 2017 and x64-release). In both release version the query failed like before, but I could get the explain result without crash. Here are explain outputs of the failed query:

{
  "items": [],
  "namespaces": [
    "Test"
  ],
  "cache_enabled": true,
  "explain": {
    "total_us": 56,
    "prepare_us": 25,
    "indexes_us": 21,
    "postprocess_us": 6,
    "loop_us": 3,
    "general_sort_us": 0,
    "sort_index": "-",
    "sort_by_uncommitted_index": false,
    "selectors": [
      {
        "items": 2,
        "matched": 2,
        "method": "scan",
        "type": "SingleRange"
      },
      {
        "field": "Id",
        "keys": 0,
        "comparators": 1,
        "cost": 3,
        "matched": 0,
        "method": "scan",
        "type": "OnlyComparator"
      }
    ]
  }
}
olegator77 commented 4 years ago

Actually problem looks like tsl::hopscotch_set or MSVC compiler bug. I wrote issue to tsl::hopscotch_set repo - https://github.com/Tessil/hopscotch-map/issues/52

In the next reindexer release we will introduce workaround: just replace emplace with insert here: https://github.com/Restream/reindexer/blob/master/cpp_src/core/comparatorimpl.h#L102

olegator77 commented 4 years ago

I've figured out that if I set kMaxSelectivityPercentForIdset to 100 in indexunorderd.cc, it works like before v2.9.2. I think there is a bug in this condition in indexunorderd.cc at line 192:

This change can seriously slow down queries execution with wide IN () condition to low selectivity indexes. Actually, kMaxSelectivityPercentForIdset mean, that 'choose scan method instead of index, if scan will be more efficient'

oruchreis commented 4 years ago

I confirm that it is fixed in v2.11.1. Thanks for your effort. Best Regards.