Open mhkc opened 2 years ago
Storing more variant info in the case object might help a lot when comes to the pages loading time. But what about the old cases, where this info will not be available on the case level?
Mmmm we don't have timeout issues in Scout since we fixed that database indexes (a couple of indexes missing) problem and we have some 156million variants in there..
Yes, we would be at about an order of magnitude above that, and currently dont see neither slow queries or timeouts, though it happened for a bit only a few weeks ago before fixing indexes. Could it be that one or another of your - our our shared - indexes are not working as expected? I remember you had a few extra/different ones when we worked with that causatives page timeout? Just to be safe, could you check that you at least you have all that are specified in constants for the current master? The first output example above has a section I don't quite understand yet about the index scan not delivering as the planner expected.
We are just in the process of migrating db servers in Solna, so will likely have to browse through indices and check logs/query performance in the coming few weeks.
First case: evaluated variants query, can easily be optimized by using events. Here is the slow query function: https://github.com/Clinical-Genomics/scout/blob/7b9f2b136d9628e33792e757d24c5623a1c2ab49/scout/adapter/mongo/variant.py#L664
Possible solution of slow query returning variants by gene panel: query variants by gene panel _id and not by gene _ids. Then if a gene panel gets updated (for instance removing a gene), then in the background run queries to update the variants gene panel.
Thing to do: create an index on variant gene ids (list)
Confirm that the count of variants by category for a case is done just once when the case is viewed for the fist time
Loqusdb queries (SVs?) might be missing an index for the right position
Loqusdb queries (SVs?) might be missing an index for the right position
2021-10-25T12:37:23.828+0200 I COMMAND [conn14176] command scout.variant command: find { find: "variant", filter: { $and: [ { case_id: "setgator" }, { $or: [ { acmg_classification: { $exists: true } }, { manual_rank: { $exists: true } }, { cancer_tier: { $exists: true } }, { dismiss_variant: { $exists: true } }, { mosaic_tags: { $exists: true } } ] } ] } } planSummary: IXSCAN { case_id: 1, category: 1, variant_type: 1, chromosome: 1, start: 1, end: 1 } keysExamined:6689 docsExamined:6689 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:78 nreturned:29 reslen:257126 locks:{ Global: { acquireCount: { r: 158 } }, Database: { acquireCount: { r: 79 } }, Collection: { acquireCount: { r: 79 } } } protocol:op_query 1251ms
Reopening this as not all slow queries discussed here have been optimized!
cat mongodb.log | jq ' select ( .msg == "Slow query" ) ‘ > slowqlog.json
slowqlog.json | jq
Several mongo queries are really slow which occasionally can cause timeout issues for users.
The general issue are that many queries uses $and/$or statements which require mongo to evaluate all cases/ variants. Some views such as the variants list view require multiple db queries. This causes several issues since our database containing 2,947 cases and 15,469,895 variants. Most queries takes about 200-300ms with some reaching towards 2000ms.
Solution ideas
I do think we need to transition from aggregation and filters in and instead store references to the variants in the case object.
Examples of slow queries
Case view
Variants list view
Update variants