element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
GNU Affero General Public License v3.0
11.21k stars 2k forks source link

Element R: App freezes while backing up megolm keys #26488

Closed anoadragon453 closed 10 months ago

anoadragon453 commented 1 year ago

Steps to reproduce

  1. Log in to https://develop.element.io using my @andrewm:element.io account.
  2. Put in my key backup passphrase during login, which connects my session to key backup.
  3. Attempt to use the app

Outcome

What did you expect?

No freezing.

What happened instead?

The app freezes for ~10s at a time, every 2-5s. It appears the UI thread if getting blocked behind something in the Rust crypto layer. Taking a performance recording in Chrome, I see that wasm-function[xxxx] is the culprit, but no more data than that.

The app freezes, then once it unfreezes, the following log is printed:

 INFO matrix_sdk_crypto::backups: Successfully created a room keys backup request, key_count: 100, keys: {"!JMaTDIsletcCnEaaDp:matrix.org": {"yyyyyyyyyyyyyyyyy": {"zzzzzzzzzzzzzzzz", ... }}}, backup_key: MegolmV1BackupKey { key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", version: Some("1") }
    at /home/work/.cargo/git/checkouts/matrix-rust-sdk-1f4927f82a3d27bb/7440ce0/crates/matrix-sdk-crypto/src/backups/mod.rs:506

this continues over and over, each time with a different set of 100 keys.

Operating system

NixOS Linux

Browser information

Chromium v118.0.5993.117

URL for webapp

develop.element.io

Application version

Both on develop.element.io and when building from latest source today, I also built and linked matrix-rust-sdk-crypto-wasm

Homeserver

element.io

Will you send logs?

No

richvdh commented 1 year ago

Part of the problem here is https://github.com/vector-im/element-web/issues/26347, though that doesn't explain why it locks up the UI while it's happening.

richvdh commented 1 year ago

@anoadragon453 has sent me a performance profile demonstrating the problem. It shows calls to <matrix_sdk_indexeddb::crypto_store::IndexeddbCryptoStore as matrix_sdk_crypto::store::traits::CryptoStore>::get_inbound_group_sessions::{{closure}}::hf2c9e7d12573886c which block for 7 seconds.

richvdh commented 1 year ago

IndexeddbCryptoStore::inbound_group_sessions_for_backup (which we call repeatedly, until all keys are backed up. And then again whenever we think there might be more keys to back up) calls get_inbound_group_sessions, which decrypts and returns ALL megolm keys recorded in the indexeddb, and then filters out the first 100 of those that are not already backed up.

This is clearly an insane way to do things, since the key store can grow very large indeed. We need to update the indexeddb object store so that it keeps a queryable record of which sessions need backing up.

The hardest part of this is likely to be migrating existing data; given EWR is relatively experimental, perhaps we can get away with just assuming that all existing keys are backed up.

ara4n commented 1 year ago

just as a datapoint: disabling seshat has stopped my logs tight-looping with keyshare reqs. but the freezing (presumably due to this bug) is as bad as it was before (and makes the app pretty much unusable).

ara4n commented 11 months ago

This is still happening in today's nightly (which i believe has the fix):

Element Nightly version: 2023120201 Crypto version: Rust SDK 0.6.0 (8931d87), Vodozemac 0.5.0

I can't save a perf trace on it due to https://github.com/electron/electron/issues/39818 but a screenshot of a profile looks like:

Screenshot 2023-12-02 at 14 59 53
richvdh commented 11 months ago

@ara4n unfortunately that profile doesn't tell us a great deal.

@BillCarsonFr also reported continued freezing after the fix landed, but I don't think we have a profile from him either :(. (He did report that it went away having merged https://github.com/matrix-org/matrix-js-sdk/pull/3934 into his dev copy, but that feels like fixing the symptoms rather than the cause.)

anoadragon453 commented 11 months ago

I too am continuing to experience freezes on:

Initially, the application runs fine, even after feeding it my key backup passphrase. Then, I go into Security & Privacy settings and try to see the status of key backup. I imagine this trigger key backup to start, at which point the application shows familiar signs of freezing for 5-10s every 5 seconds.

Element employees can view a performance trace from Chromium here: https://matrix.to/#/!UcgyhoigetVICUfvRw:matrix.org/$ykAFN0Z3NRkiyTGVWBLelZpiS72mQ_gXoL19IISZtWA?via=element.io&via=matrix.org&via=jki.re

BillCarsonFr commented 11 months ago

will add debug symbols to profile https://github.com/vector-im/element-web/issues/26693

ara4n commented 11 months ago

I got a trace from EWR (not EDR) here, fwiw: https://github.com/matrix-org/element-web-rageshakes/issues/23324

richvdh commented 11 months ago

Ok, it turns out that there is another call to CryptoStore::get_inbound_group_sessions (which, per the above, is a disaster area) in the "mark request as sent" path here which explains why we are still seeing problems here.

I think we need a new method in CryptoStore which takes the list of room/sender/session triplets from the backup request, and marks them all as sent. Then we need to burn CryptoStore::get_inbound_group_sessions with fire. Edit: we can't do this (yet) because it is also used by the "export sessions" flow, though per https://github.com/vector-im/element-web/issues/26681, that also needs fixing.

richvdh commented 11 months ago

I think this is also responsible for logs along the lines of:

Backup: Error processing backup request for rust crypto-sdk Error: failed to read or write to the crypto store DomException UnknownError (0): The operation failed for reasons unrelated to the database itself and not covered by any other error code."

(and also things like violation: success handler took 10000ms or words to that effect.)

-- in short, it's taking tens of seconds to read all the inbound group sessions from the store, and indexeddb is complaining about it.

ara4n commented 10 months ago

The app is still freezing for minutes on end when backing up megolm keys.

ara4n commented 10 months ago

actually, it's not freezing, it's "just" taking 1-5 mins to send msgs

richvdh commented 10 months ago

actually, it's not freezing, it's "just" taking 1-5 mins to send msgs

ie, it's https://github.com/element-hq/element-web/issues/26783