Closed anoadragon453 closed 10 months ago
Part of the problem here is https://github.com/vector-im/element-web/issues/26347, though that doesn't explain why it locks up the UI while it's happening.
@anoadragon453 has sent me a performance profile demonstrating the problem. It shows calls to <matrix_sdk_indexeddb::crypto_store::IndexeddbCryptoStore as matrix_sdk_crypto::store::traits::CryptoStore>::get_inbound_group_sessions::{{closure}}::hf2c9e7d12573886c
which block for 7 seconds.
IndexeddbCryptoStore::inbound_group_sessions_for_backup
(which we call repeatedly, until all keys are backed up. And then again whenever we think there might be more keys to back up) calls get_inbound_group_sessions
, which decrypts and returns ALL megolm keys recorded in the indexeddb, and then filters out the first 100 of those that are not already backed up.
This is clearly an insane way to do things, since the key store can grow very large indeed. We need to update the indexeddb object store so that it keeps a queryable record of which sessions need backing up.
The hardest part of this is likely to be migrating existing data; given EWR is relatively experimental, perhaps we can get away with just assuming that all existing keys are backed up.
just as a datapoint: disabling seshat has stopped my logs tight-looping with keyshare reqs. but the freezing (presumably due to this bug) is as bad as it was before (and makes the app pretty much unusable).
This is still happening in today's nightly (which i believe has the fix):
Element Nightly version: 2023120201 Crypto version: Rust SDK 0.6.0 (8931d87), Vodozemac 0.5.0
I can't save a perf trace on it due to https://github.com/electron/electron/issues/39818 but a screenshot of a profile looks like:
@ara4n unfortunately that profile doesn't tell us a great deal.
@BillCarsonFr also reported continued freezing after the fix landed, but I don't think we have a profile from him either :(. (He did report that it went away having merged https://github.com/matrix-org/matrix-js-sdk/pull/3934 into his dev copy, but that feels like fixing the symptoms rather than the cause.)
I too am continuing to experience freezes on:
Initially, the application runs fine, even after feeding it my key backup passphrase. Then, I go into Security & Privacy settings and try to see the status of key backup. I imagine this trigger key backup to start, at which point the application shows familiar signs of freezing for 5-10s every 5 seconds.
Element employees can view a performance trace from Chromium here: https://matrix.to/#/!UcgyhoigetVICUfvRw:matrix.org/$ykAFN0Z3NRkiyTGVWBLelZpiS72mQ_gXoL19IISZtWA?via=element.io&via=matrix.org&via=jki.re
will add debug symbols to profile https://github.com/vector-im/element-web/issues/26693
I got a trace from EWR (not EDR) here, fwiw: https://github.com/matrix-org/element-web-rageshakes/issues/23324
Ok, it turns out that there is another call to CryptoStore::get_inbound_group_sessions
(which, per the above, is a disaster area) in the "mark request as sent" path here which explains why we are still seeing problems here.
I think we need a new method in CryptoStore which takes the list of room/sender/session triplets from the backup request, and marks them all as sent. Then we need to burn Edit: we can't do this (yet) because it is also used by the "export sessions" flow, though per https://github.com/vector-im/element-web/issues/26681, that also needs fixing.CryptoStore::get_inbound_group_sessions
with fire.
I think this is also responsible for logs along the lines of:
Backup: Error processing backup request for rust crypto-sdk Error: failed to read or write to the crypto store DomException UnknownError (0): The operation failed for reasons unrelated to the database itself and not covered by any other error code."
(and also things like violation: success handler took 10000ms
or words to that effect.)
-- in short, it's taking tens of seconds to read all the inbound group sessions from the store, and indexeddb is complaining about it.
The app is still freezing for minutes on end when backing up megolm keys.
actually, it's not freezing, it's "just" taking 1-5 mins to send msgs
actually, it's not freezing, it's "just" taking 1-5 mins to send msgs
ie, it's https://github.com/element-hq/element-web/issues/26783
Steps to reproduce
@andrewm:element.io
account.Outcome
What did you expect?
No freezing.
What happened instead?
The app freezes for ~10s at a time, every 2-5s. It appears the UI thread if getting blocked behind something in the Rust crypto layer. Taking a performance recording in Chrome, I see that
wasm-function[xxxx]
is the culprit, but no more data than that.The app freezes, then once it unfreezes, the following log is printed:
this continues over and over, each time with a different set of 100 keys.
Operating system
NixOS Linux
Browser information
Chromium v118.0.5993.117
URL for webapp
develop.element.io
Application version
Both on develop.element.io and when building from latest source today, I also built and linked matrix-rust-sdk-crypto-wasm
Homeserver
element.io
Will you send logs?
No