Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
152 stars 46 forks source link

Sort exceeded memory when loading a case #4533

Closed northwestwitch closed 7 months ago

northwestwitch commented 7 months ago

Here @Jakob37:

2024-03-27 08:27:19 3838f480efca scout.commands.load.case[202] ERROR Unhandled Exception: Traceback (most recent call last):
  File "/home/worker/app/scout/commands/load/case.py", line 124, in case
    adapter.load_case(config_data, update, keep_actions)
  File "/home/worker/app/scout/adapter/mongo/case.py", line 925, in load_case
    self.load_variants(
  File "/home/worker/app/scout/adapter/mongo/variant_loader.py", line 740, in load_variants
    self.update_variant_rank(case_obj, variant_type, category=category)
  File "/home/worker/app/scout/adapter/mongo/variant_loader.py", line 81, in update_variant_rank
    for index, var_obj in enumerate(variants):
  File "/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1264, in next
    if len(self.__data) or self._refresh():
  File "/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1181, in _refresh
    self.__send_message(q)
  File "/venv/lib/python3.8/site-packages/pymongo/cursor.py", line 1060, in __send_message
    response = client._run_operation(
  File "/venv/lib/python3.8/site-packages/pymongo/_csot.py", line 107, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1394, in _run_operation
    return self._retryable_read(
  File "/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1492, in _retryable_read
    return self._retry_internal(
  File "/venv/lib/python3.8/site-packages/pymongo/_csot.py", line 107, in csot_wrapper
    return func(self, *args, **kwargs)
  File "/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1453, in _retry_internal
    return _ClientConnectionRetryable(
  File "/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 2315, in run
    return self._read() if self._is_read else self._write()
  File "/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 2445, in _read
    return self._func(self._session, self._server, conn, read_pref)  # type: ignore
  File "/venv/lib/python3.8/site-packages/pymongo/mongo_client.py", line 1390, in _cmd
    return server.run_operation(
  File "/venv/lib/python3.8/site-packages/pymongo/helpers.py", line 322, in inner
    return func(*args, **kwargs)
  File "/venv/lib/python3.8/site-packages/pymongo/server.py", line 167, in run_operation
    _check_command_response(first, conn.max_wire_version)
  File "/venv/lib/python3.8/site-packages/pymongo/helpers.py", line 230, in _check_command_response
    raise OperationFailure(errmsg, code, response, max_wire_version)
pymongo.errors.OperationFailure: Executor error during find command :: caused by :: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting., full error: {'ok': 0.0, 'errmsg': 'Executor error during find command ::
 caused by :: Sort exceeded memory limit of 104857600 bytes, but did not opt in to external sorting.', 'code': 292, 'codeName': 'QueryExceededMemoryLimitNoDiskUseAllowed'}

Aborted!

northwestwitch commented 7 months ago

What type of analysis and how many variants does you case have @Jakob37? I don't think we have seen this error message before 🤔

Jakob37 commented 7 months ago

What type of analysis and How many variants does you case have @Jakob37? I don't think we have seen this error message before 🤔

This is a full wgs sample, with ~132000 SNVs and ~12500 SVs.

It is a bit strange, as this sample has worked fine previously when testing. Not sure what changed.

northwestwitch commented 7 months ago

Does your database have indexes?

Jakob37 commented 7 months ago

Hmm, it is a very default setup, with everything running in containers. I just have the demo cases, and two test cases I reload (I haven't set up data persistence for the mongo db).

Checking the indices, it looks like there is a bunch for instance for the variant db:

> db.variant.getIndices()
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_"
        },
        {
                "v" : 2,
                "key" : {
                        "case_id" : 1,
                        "category" : 1,
                        "variant_type" : 1,
                        "variant_rank" : 1,
                        "hgnc_ids" : 1
                },
                "name" : "caseid_category_varianttype_variantrank_hgncids",
                "background" : true
        },
        {
                "v" : 2,
                "key" : {
                        "hgnc_symbols" : 1,
                        "rank_score" : -1,
                        "category" : 1,
                        "variant_type" : 1
                },
                "name" : "hgncsymbol_rankscore_category_varianttype",
                "background" : true,
                "partialFilterExpression" : {
                        "rank_score" : {
                                "$gt" : 5
                        },
                        "category" : "snv"
                }
        },
        {
                "v" : 2,
                "key" : {
                        "variant_id" : 1,
                        "case_id" : 1,
                        "category" : 1
                },
                "name" : "variantid_caseid_category",
                "background" : true
        },
        {
                "v" : 2,
                "key" : {
                        "case_id" : 1,
                        "category" : 1,
                        "variant_type" : 1,
                        "chromosome" : 1,
                        "start" : 1,
                        "end" : 1
                },
                "name" : "caseid_category_chromosome_start_end",
                "background" : true
        },
        {
                "v" : 2,
                "key" : {
                        "variant_id" : 1,
                        "institute" : 1
                },
                "name" : "variant_id_institute",
                "background" : true
        }
]
northwestwitch commented 7 months ago

Do your variants have a ranking?

dnil commented 7 months ago

Sry, wrong old index. This one had the score for sorting scores into rank.

Screenshot 2024-03-28 at 08 56 34
dnil commented 7 months ago

Again, running nicely without on our stage for months, shouldn't be a perf diff, but it is a change in how mongod does the sort (index once, effectively sorting, then use index or sort directly). We'll try some tests with cases with a large number of variants as well. We could also just put the index back; the thing is it is both ofcourse big and should only be used once per load. It also interferes a bit with the query planner, since the first few items are the same as for others.

Jakob37 commented 7 months ago

OK I'll try adding this one then and see if it resolves it!

Jakob37 commented 7 months ago

Nice, looks like it worked loading it now!

dnil commented 7 months ago

Let’s see if we can’t get it reintroduced then, maybe with another key order..