element-hq / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://element-hq.github.io/synapse
GNU Affero General Public License v3.0
1.56k stars 193 forks source link

Synapse doesn't backfill missing events when rejoining a room later #15717

Open matrixbot opened 11 months ago

matrixbot commented 11 months ago

This issue has been migrated from #15717.


Description

If I'm the only user from my home server in a room (with more users from other severs), stays there for a while and get's a lot of events, then leave the room, then rejoins the room a few days later, I'm missing the events in the room between the point where I left the room up to the point I rejoined the room.

Synapse does not backfill those events.

Steps to reproduce

You will see something like this:

I have reported this before years ago but I shut down that server before Matrix team started to investigate, and when they looked at the bug they closed it because I wasn't active any more. So this is an old bug and not something newly introduced.

Homeserver

sinnesro.se

Synapse Version

{"server_version":"1.84.1","python_version":"3.9.2"}

Installation Method

Debian packages from packages.matrix.org

Database

Single PostgreSQL, never used SQLite, never restored from backup

Workers

Single process

Platform

Debian 11 latest updated Running in an LXD

Intel Core i5 12th gen, 16 GB RAM, 256 GB SSD

Configuration

No response

Relevant log output

I have included my home server logs earlier in the bug report because I don't know what is relevant to show here.

Anything else that would be useful to know?

No response

kegsay commented 8 months ago

This is 100% reproducible so relabelled as O-Frequent. This results in lost messages so relabelled as S-Major.

I hit this in a complement crypto test TestBobCanSeeButNotDecryptHistoryInPublicRoom when running it between 2 JS clients over federation. I realised that the sync v2 timeline response simply did not include the message sent when Bob had left. I wrote a Complement test to verify this bug: https://github.com/matrix-org/complement/pull/716

It's worth noting EX does not have this bug due to how it manages timelines (in this scenario sliding sync just returns the join event and a prev_batch token, which makes clients hit /messages and everything is fine). This does mean that we do not honour the timeline_limit in this case, but it means that we return a correct timeline.

The effect of this bug is quite bad because: