element-hq / element-x-ios

Next generation Matrix client for iOS built with SwiftUI on top of matrix-rust-sdk.
https://element.io/labs/element-x
GNU Affero General Public License v3.0
396 stars 88 forks source link

SSS got completely stuck #3115

Open ara4n opened 1 month ago

ara4n commented 1 month ago

Steps to reproduce

  1. opened app on slightly dodgy connectivity (1 bar of wifi)
  2. roomlist showed stale rooms from hours ago
  3. timelines within rooms showed stale history too
  4. waited a while to see if a spinner would turn up, or history would resync, despite moving onto good connectivity
  5. no spinner; no sync

Outcome

What did you expect?

There should be a spinner if you are staring at stale history wondering if it's stale or not.

Sync should not get stuck due to bad connectivity, but retry when connectivity recovers.

What happened instead?

Stuck sync, with zero UI feedback to tell you you're offline or looking at stale info.

Your phone model

No response

Operating system version

No response

Application version

669

Homeserver

No response

Will you send logs?

Yes

erikjohnston commented 1 month ago

Ah, this happened because I restarted the server which blew away the in-memory cache of which rooms we'd sent down. This caused it to basically try and send down all your rooms again.

We need to migrate the per-connection state to the DB, but for now: https://github.com/element-hq/synapse/pull/17529

erikjohnston commented 1 month ago

PR has landed and been deployed

MadLittleMods commented 1 month ago

I feel like this should be re-opened to address better UI feedback:

zero UI feedback to tell you you're offline or looking at stale info.


I also want to point out that if @erikjohnston's investigation is correct, /sync wasn't completely stuck, just slow because the client is asking for a full range of rooms, and without the cache to tell whether a room has been sent down the connection before, we end up sending down all rooms and their state from scratch (which can be very slow). With https://github.com/element-hq/synapse/pull/17529, we expire the connection and allow the client a chance to do an initial request with a smaller range of rooms to get them some results sooner but will end up taking the same amount of time (more with round-trips and re-processing) in the end to get everything again.

ara4n commented 3 weeks ago

so it may be a different cause, but i just got this again

manuroe commented 1 week ago

@ara4n can you send a rageshake when it happens again?