Clinical-Genomics / scout

VCF visualization interface
https://clinical-genomics.github.io/scout
BSD 3-Clause "New" or "Revised" License
150 stars 46 forks source link

Many slow queries.. #4102

Closed dnil closed 11 months ago

dnil commented 1 year ago

Certain views have gone a bit slow lately - case, variantS come to mind. I haven't dug deep on this, but there are some slow queries I don't quite recognise. They don't look particularly nasty, like pollscan on cases: even if has grown quite a bit, it is still not huge, but let's see what we can do.

{"t":{"$date":"2023-09-30T05:53:14.522+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn189193","msg":"Slow query","attr":{"type":"command","ns":"scout.case","command":{"aggregate":"case","pipeline":[{"$match":{"collab
orators":"cust002"}},{"$group":{"_id":"$status","count":{"$sum":1}}}],"cursor":{},"lsid":{"id":{"$uuid":"1bbc6218-8d2b-4fc5-8247-e3b39589fc22"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1696045994,"i":1000}},"signature":
{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"scout","$readPreference":{"mode":"primary"}},"planSummary":"IXSCAN { collaborators: 1, status: 1, updated_at: 1 }","keysExamined":130
46,"docsExamined":0,"cursorExhausted":true,"numYields":35,"nreturned":5,"queryHash":"D7B24A0C","planCacheKey":"7175E18B","reslen":404,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":37}},"Global":{"acquireCount":{"r":
37}},"Database":{"acquireCount":{"r":37}},"Collection":{"acquireCount":{"r":37}},"Mutex":{"acquireCount":{"r":2}}},"storage":{"data":{"bytesRead":156505,"timeReadingMicros":12061}},"protocol":"op_msg","durationMillis":446}}

{"t":{"$date":"2023-09-30T05:53:20.693+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn189207","msg":"Slow query","attr":{"type":"command","ns":"scout.case","command":{"find":"case","filter":{"owner":"cust002","displa
y_name":"F0055936"},"limit":1,"singleBatch":true,"lsid":{"id":{"$uuid":"a520c561-c5d9-44fb-a154-67f737230033"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1696046000,"i":4000}},"signature":{"hash":{"$binary":{"base64":"AAA
AAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"scout","$readPreference":{"mode":"primary"}},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":22660,"cursorExhausted":true,"numYields":30,"nreturned":1,"query
Hash":"BB85317C","planCacheKey":"BB85317C","reslen":8590,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":31}},"Global":{"acquireCount":{"r":31}},"Database":{"acquireCount":{"r":31}},"Collection":{"acquireCount":{"r":3
1}},"Mutex":{"acquireCount":{"r":1}}},"storage":{"data":{"bytesRead":342750175,"timeReadingMicros":382497}},"protocol":"op_msg","durationMillis":409}}

{"t":{"$date":"2023-09-30T05:53:21.086+02:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn189207","msg":"Slow query","attr":{"type":"command","ns":"scout.event","command":{"find":"event","filter":{"$and":[{"case":"justca
t"},{"category":"variant"},{"verb":"comment"}]},"lsid":{"id":{"$uuid":"a520c561-c5d9-44fb-a154-67f737230033"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1696046000,"i":10000}},"signature":{"hash":{"$binary":{"base64":"AAA
AAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"scout","$readPreference":{"mode":"primary"}},"planSummary":"IXSCAN { category: 1, verb: 1 }","keysExamined":71585,"docsExamined":71585,"cursorExhausted":true,"numYiel
ds":71,"nreturned":0,"queryHash":"C210EBBF","planCacheKey":"54461D1B","reslen":225,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":72}},"Global":{"acquireCount":{"r":72}},"Database":{"acquireCount":{"r":72}},"Collecti
on":{"acquireCount":{"r":72}},"Mutex":{"acquireCount":{"r":1}}},"storage":{"data":{"bytesRead":305369802,"timeReadingMicros":231324}},"protocol":"op_msg","durationMillis":368}}
dnil commented 1 year ago

Also the general case report is fairly slow again. Hmm..

dnil commented 1 year ago

Given how unpredictable this seems, I think we may mostly be dealing with fallout from a large index size compared to system RAM again. We have stepped over the all-indexes-in-memory bound since a little while, and will be at the mercy of certain optimisation whims.

Screenshot 2023-10-02 at 16 44 07 Screenshot 2023-10-02 at 16 43 21

That is not counting the extra dbs co-hosted. Most are smallish, but e.g. loqusdb-rd is sizeable.

Screenshot 2023-10-02 at 16 49 48

We also have that mongod-backup-service for the "smallish" dbs, but that does include loqusdb-rd. Whenever that runs we will have a reshuffle of a sizeable part of active memory.

northwestwitch commented 1 year ago

Also the general case report is fairly slow again. Hmm..

At least the report speed can be improved relatively easily perhaps?

dnil commented 1 year ago

Potential local fix: https://github.com/Clinical-Genomics/IT-issues/issues/733

northwestwitch commented 1 year ago

Potential local fix: Clinical-Genomics/IT-issues#733

That's cheating!

dnil commented 11 months ago

Ok, we are not in the clear yet - both the case and variantS views make slow queries, but a qualitative investigation says this and the past few optimisation PRs helped a bit. I'm closing this for now based on completion of https://github.com/Clinical-Genomics/IT-issues/issues/733, and we can keep working on query optimisations e.g. https://github.com/Clinical-Genomics/scout/issues/4153 and possibly more batch processing for e.g. collapsing identical variant documents.