element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
GNU Affero General Public License v3.0
10.99k stars 1.95k forks source link

Gracefully handle when rooms in space pagination token expires - Unable to paginate rooms in the space `/hierarchy` -> 400 `Unknown pagination token` #22138

Open MadLittleMods opened 2 years ago

MadLittleMods commented 2 years ago

Steps to reproduce

  1. Go to the space home
  2. Scroll and paginate the rooms in the space until you get stuck
    • Maybe this matters, I was also searching/filtering for a specific room before that
  3. Notice Failed to load list of rooms. message in Element and the underlying network request errors:

GET https://matrix-client.matrix.org/_matrix/client/v1/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=iDciuFbCVinPIwUGYtUzQLHO&limit=20 -> ❌ 400 bad request

{"errcode":"M_INVALID_PARAM","error":"Unknown pagination token"}

I first created an issue in the Synapse tracker thinking the bad pagination token was being returned from the homeserver but it turns out there was just a 5 minute gap between pagination requests and Synapse only keeps track of pagination tokens within a 5 minute window.

What I think happened is this:

  1. Go to the space home
  2. Search/filter for a specific room
  3. Wait for the spinner appearing over and over for each /hierarchy request
    • It feels like at a certain point, the app gave up on paginating the rooms for some reason even though the loading spinner was still present. So I was waiting around even though it was doing nothing and pagination token expired behind the scenes. Is there something in the logs indicating why it would stop paginating?
  4. Get tired of waiting and clear the search/filter which reveals all rooms fetched so far
  5. Scroll down to the bottom of the list and try to paginate more
  6. Run into the Failed to load list of rooms. error and underlying network error /hierarchy ❌ 400 bad request

I think this is the last request Element made while I was searching/filtering even though it had a next_batch, and the spinner was still showing.

GET https://matrix-client.matrix.org/_matrix/client/v1/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=rCkHRkRndYnCOIInVookuxrI&limit=20 -> ✅ 200 OK

date: Tue, 10 May 2022 19:02:38 GMT (response header)

{ "rooms": [...], "next_batch":"iDciuFbCVinPIwUGYtUzQLHO" }

Then 7 minutes later, I cleared the search/filter and tried paginating manually by scrolling to the bottom of the list:

GET https://matrix-client.matrix.org/_matrix/client/v1/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=iDciuFbCVinPIwUGYtUzQLHO&limit=20 -> ❌ 400 Bad request

date: Tue, 10 May 2022 19:08:58 GMT (response header)

{"errcode":"M_INVALID_PARAM","error":"Unknown pagination token"}

Outcome

What did you expect?

  1. Element keeps paginating while the loading spinner is visible when searching/filtering
  2. Element gracefully fetches a new pagination token after the 5 minute expiration (we have to paginate from the beginning again)
  3. All of these problems are exacerbated because searching a space is painfully slow and there is no server endpoint to do it directly.

What happened instead?

Logs: https://github.com/matrix-org/element-web-rageshakes/issues/12829

Operating system

Windows 10

Browser information

Chrome Version 100.0.4896.127

URL for webapp

https://develop.element.io/

Application version

Element version: 479d4bf64d97-react-14127c777b87-js-34cfa511049e Olm version: 3.2.8

Homeserver

matrix.org

Will you send logs?

Yes -> https://github.com/matrix-org/element-web-rageshakes/issues/12829

t3chguy commented 2 years ago

Sorry, where does the https://spec.matrix.org/v1.2/client-server-api/#get_matrixclientv1roomsroomidhierarchy API say that the token can expire?

t3chguy commented 2 years ago

Element is performing as per the spec, I suggest trying to reproduce on Android & iOS and I'm sure you'll find the same failure mode. Synapse or the Spec are wrong, the onus is on Synapse to prove which is which.

MadLittleMods commented 2 years ago

Regardless of if the pagination token should expire or not, when I filter the space in Element, it should keep paginating the rooms in a space until it reaches the end. There were no results to fill up the scroll area to stop it and the loading spinner was still showing when I stopped filtering. I can actually confirm this because I took a screenshot of the unfiltered list with the loading spinner at the bottom which clearly indicates it wasn't done paginating and should have kept going during my filter (to extra confirm it's not just a stuck spinner, the list isn't complete).

If preferred, I can create a separate issue for this though. It's very related to how I ran into this situation in the first place though.


I've just searched again and reached a proper end state which wasn't there:

No results found
You may want to try a different search or check for typos.

turt2live commented 2 years ago

per https://github.com/matrix-org/synapse/issues/12697#issuecomment-1122947619 I am reopening this as a client-side problem.

turt2live commented 2 years ago

spec issue: https://github.com/matrix-org/matrix-spec/issues/1058

MadLittleMods commented 2 years ago

In terms of design decisions:

largely what the UI should do in this case. It's not immediately clear whether we should throw everything away and refresh or if we should try and do something more intelligent

or maybe even a button which says "try again" or whatever

-- @turt2live, https://matrix.to/#/!bEWtlqtDwCLFIAKAcv:matrix.org/$I6cbmSawd27Nq6w9xw-68vj8MoJO3q6yE9x5MrVmLps?via=matrix.org&via=element.io&via=mozilla.org

robintown commented 2 years ago

Downgrading this to occasional and minor, since I don't believe this happens super regularly, and the workaround is to reopen the space landing page and try again

nadonomy commented 2 years ago

In terms of design decisions:

largely what the UI should do in this case. It's not immediately clear whether we should throw everything away and refresh or if we should try and do something more intelligent or maybe even a button which says "try again" or whatever -- @turt2live, https://matrix.to/#/!bEWtlqtDwCLFIAKAcv:matrix.org/$I6cbmSawd27Nq6w9xw-68vj8MoJO3q6yE9x5MrVmLps?via=matrix.org&via=element.io&via=mozilla.org

Just noting on this - it's really tough for us to make the right design decisions without pairing closely on the considerations or tradeoffs of any path with engineering. So, whoever ends up looking at this, please feel free to reach out for us to figure out the right path.

t3chguy commented 1 year ago

Spec still hasn't been clarified.

anoadragon453 commented 1 month ago

Would it help if the homeserver returned M_UNKNOWN_TOKEN in the case of a token expiring? That way the client could be more confident in starting pagination over again, versus some other error with the request.

I don't think it would necessary help for the spec to say that tokens can expire. If a homeserver implemented the session store in-memory, and the homeserver was restarted, you'd still get the same failure mode.

t3chguy commented 1 month ago

I don't think it would necessary help for the spec to say that tokens can expire.

The spec currently says M_UNKNOWN_TOKEN is only for accessToken/refreshToken, so unless the spec was updated then if anything the client would be even more confused, as it'd think its accessToken was expired, rather than the pagination token.