element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
GNU Affero General Public License v3.0
11.17k stars 1.99k forks source link

Disappearing custom relation events after restarting client or initial sync #27132

Open tcpipuk opened 8 months ago

tcpipuk commented 8 months ago

Steps to reproduce

When closing Element Web (I've tried app.element.io and develop.element.io) or clearing cache, the m.notice messages seem to disappear.

Here is a screenshot of an example ping test I just ran in the #ping-no-synapse:maunium.net room:

Ping results before clearing cache in client

Then how the room looks after clearing the cache in client to perform an initial sync:

Missing ping results after clearing cache in client

I've also tried using /devtools to show all hidden messages and there's simply no trace of these missing m.notice messages. I'm confident it's not a server issue as this behaviour doesn't occur in Element Android or other clients.

Outcome

Expected outcome is that rooms would appear roughly the same after restarting client or clearing cache as they did beforehand.

Operating system

Microsoft Windows 11 Pro 23H2 (22631.3235)

Browser information

Microsoft Edge 122.0.2365.66 (Official build) (64-bit)

URL for webapp

develop.element.io

Application version

Element version: f01d69f90b98-react-942fabc5a8fe-js-7fee37680f33 Crypto version: Rust SDK 0.7.0 (b1918e9), Vodozemac 0.5.1

Homeserver

doctoruwu.uk

Will you send logs?

Yes

t3chguy commented 8 months ago

I'm confident it's not a server issue as this behaviour doesn't occur in Element Android or other clients.

Have you looked at the sync responses to confirm this? Just because another client gets them doesn't mean the server sent them the same data.

Please provide a sample of event IDs to correlate in your logs for the missing events.

tcpipuk commented 8 months ago

Have you looked at the sync responses to confirm this?

No, I'm not sure what I'd be looking at/for?

Please provide a sample of event IDs to correlate in your logs for the missing events.

The "spiritsail" one in the screenshot above is $-uu5w_Yz7EIgiTpH3Mnm3moPurp0aP7wAtdvDv016KE

t3chguy commented 8 months ago

image

Can you share the View Source of that same event please

tcpipuk commented 8 months ago

Not this second, but I've pulled the event from the server over federation from another server I have access to right now:

$-uu5w_Yz7EIgiTpH3Mnm3moPurp0aP7wAtdvDv016KE
{
    "auth_events": [
        "$fJMpHekmhxm920T0QYIb-uoiAhkYd0y_cDN260N7m2o", 
        "$vYTZkTQPBTGPdzHtCpgo-Qz7dZ9uh5mbkqXPB4teN_s", 
        "$jS7DCCu_i_N4IJtmG9iofFZtC_i1p2QvwygWoYjUE-U"
    ], 
    "content": {
        "body": "@tom:doctoruwu.uk: Pong! (ping \"Example\" took 275 ms to arrive)", 
        "format": "org.matrix.custom.html", 
        "formatted_body": "<a href='https://matrix.to/#/@tom:doctoruwu.uk'>@tom:doctoruwu.uk</a>: Pong! (<a href='https://matrix.to/#/!ping-no-synapse:maunium.net/$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k'>ping</a> \"Example\" took 275 ms to arrive)", 
        "m.relates_to": {
            "event_id": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k", 
            "from": "doctoruwu.uk", 
            "ms": 275, 
            "rel_type": "xyz.maubot.pong"
        }, 
        "msgtype": "m.notice", 
        "pong": {
            "from": "doctoruwu.uk", 
            "ms": 275, 
            "ping": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k"
        }
    }, 
    "depth": 33763, 
    "hashes": {
        "sha256": "QiRLY9/dH/QZXAxwnUFKsKsqwjv5vifzJqJ2JacP0K4"
    }, 
    "origin": "spritsail.io", 
    "origin_server_ts": 1709722417057, 
    "prev_events": [
        "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k"
    ], 
    "room_id": "!ping-no-synapse:maunium.net", 
    "sender": "@echo:spritsail.io", 
    "signatures": {
        "spritsail.io": {
            "ed25519:4cL10W6x": "S3ETw2iRZYqyh4MG5Id3vAF1muF6xmGQoJ6XFDHS5ZJzwt/qiAcVfZrmxmn7QlfPZoMgen/TWWjuUPIVAQeWDQ"
        }
    }, 
    "type": "m.room.message"
}
t3chguy commented 8 months ago
        "m.relates_to": {
            "event_id": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k", 
            "from": "doctoruwu.uk", 
            "ms": 275, 
            "rel_type": "xyz.maubot.pong"
        }, 

This is likely causing the issue

tcpipuk commented 8 months ago

Ah, I see. Is the problem that @tulir's echo plugin uses custom relations, or is the problem that Element Web filters them out?

t3chguy commented 8 months ago

Neither, its a bug in some assumptions the matrix-js-sdk makes

The spec says the unknown relation should be ignored

Relationships which don’t match the schema, or which break the rules of a relationship, are simply ignored.

t3chguy commented 8 months ago

The code today considers the relation and tries to find the parent event - $f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k in the case of your example - and see which timeline that event fits in. Are you able to get the source for that event too?

tcpipuk commented 8 months ago

This should be it:

{
    "auth_events": [
        "$fJMpHekmhxm920T0QYIb-uoiAhkYd0y_cDN260N7m2o", 
        "$V1MtR_rtFySFmgRJydfrcWugbd6MTkoi3fdS7pv65_o", 
        "$vYTZkTQPBTGPdzHtCpgo-Qz7dZ9uh5mbkqXPB4teN_s"
    ], 
    "content": {
        "body": "!ping Example", 
        "m.mentions": {}, 
        "msgtype": "m.text"
    }, 
    "depth": 33762, 
    "hashes": {
        "sha256": "8NTX2wrlICbf/jca+DBulP9vBLiR6eqJJE5R/w/MqmA"
    }, 
    "origin": "doctoruwu.uk", 
    "origin_server_ts": 1709722416757, 
    "prev_events": [
        "$TiVPb7UurJVqspFMQRf69TufHj9svGp9K4YKdvTUasw"
    ], 
    "room_id": "!ping-no-synapse:maunium.net", 
    "sender": "@tom:doctoruwu.uk", 
    "signatures": {
        "doctoruwu.uk": {
            "ed25519:wl95vzHo": "UVTgxmdwuzLhH9bQvYEZfiNO66RlW8gzfTqZaEljky5PpebF8C+e01iHKOk6dSEWneWoGZNh7UaMWYKEIE9MCA"
        }
    }, 
    "type": "m.room.message", 
    "unsigned": {}
}
t3chguy commented 8 months ago

So I guess the issue here boils down to that parent event not being available to the client at the time the relation is loaded and the code isn't asynchonously loading the parent of every event with a relation as this is not needed for all specced relation types (e.g. threads, replies, edits, reactions) as this would increase server load substantially. Definitely something that needs fixing but unlikely to get any attention from the team given the edge case.

EA not having this issue is because it dumps all events in the main timeline by default, but this is a common cause of stuck notifications (in cases other than this one)

tcpipuk commented 8 months ago

I see, so do you think this is a problem with Conduit not providing events in the correct order when doing a full sync?

I agree it'd be nice if Element Web didn't have this problem, but if the problem is that the server's not providing events in the correct order (so the parent arrives after the child) then I can look at possible patches on the server side to mitigate this in the meantime.

t3chguy commented 8 months ago

I see, so do you think this is a problem with Conduit not providing events in the correct order when doing a full sync?

No, more likely the event is just outside the page of events provided in the sync

Ideally this would be solved by the parent event being provided via unsigned so the client always knew the parent event but that would need spec work

realtyem commented 8 months ago

What other information would you think might be helpful here? I'd like this resolved as well.

t3chguy commented 8 months ago

What usecases warrant these non-standard relations for better informed triage

tcpipuk commented 8 months ago

The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.

It sounds like "fixing" this would require quite a lot of work, but would it be feasible to offer a /devtools or Labs option to specify a different limit, so the initial sync can pull back more events if the server permits it?

I just tested this working by overriding the limit value to 50 in Nginx so it pulls back enough events to be able to resolve the relation:

  # Specific match for sync request with "limit=20"
  location ~ ^/_matrix/client/v3/sync\?(.*)(limit=)20(.*)&_cacheBuster=(.*)$ {
    set $args $1$2$350$4; # Modify limit value to 50
    proxy_pass http://doctoruwu_conduwuit/_matrix/client/v3/sync?$args;
  }

  location /_matrix/ {
    proxy_pass http://doctoruwu_conduwuit;
  }

With this in place, Element Web now shows all of the missing events, but this workaround is just a pretty messy hack to demonstrate that it would resolve it.

t3chguy commented 8 months ago

The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.

It has nothing to do with them being m.notice events. It is entirely to do with them using a non-standard relation.

but would it be feasible to offer a /devtools or Labs option to specify a different limit, so the initial sync can pull back more events if the server permits it?

The limit is configurable via config.json only

tcpipuk commented 8 months ago

The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.

It has nothing to do with them being m.notice events. It is entirely to do with them using a non-standard relation.

Sorry, I misspoke. It seems these types of relations are used whenever Maubot replies to a user, and Maubot is one of the most commonly used bot frameworks in Matrix, so while I don't have specific numbers, it would be quite common.

The limit is configurable via config.json only

Excellent. I'll look at running my own build to mitigate this issue for now, thanks.

t3chguy commented 8 months ago

Do you know why maubot doesn't use actual replies as would be semantically accurate?

tulir commented 8 months ago

They're pongs, not replies. Only used by echo bots mostly in ping rooms though, not any other bots. Rendering replies would take even more space in the UI and there's no reason for them to be replies