Open tcpipuk opened 8 months ago
I'm confident it's not a server issue as this behaviour doesn't occur in Element Android or other clients.
Have you looked at the sync responses to confirm this? Just because another client gets them doesn't mean the server sent them the same data.
Please provide a sample of event IDs to correlate in your logs for the missing events.
Have you looked at the sync responses to confirm this?
No, I'm not sure what I'd be looking at/for?
Please provide a sample of event IDs to correlate in your logs for the missing events.
The "spiritsail" one in the screenshot above is $-uu5w_Yz7EIgiTpH3Mnm3moPurp0aP7wAtdvDv016KE
Can you share the View Source of that same event please
Not this second, but I've pulled the event from the server over federation from another server I have access to right now:
$-uu5w_Yz7EIgiTpH3Mnm3moPurp0aP7wAtdvDv016KE
{
"auth_events": [
"$fJMpHekmhxm920T0QYIb-uoiAhkYd0y_cDN260N7m2o",
"$vYTZkTQPBTGPdzHtCpgo-Qz7dZ9uh5mbkqXPB4teN_s",
"$jS7DCCu_i_N4IJtmG9iofFZtC_i1p2QvwygWoYjUE-U"
],
"content": {
"body": "@tom:doctoruwu.uk: Pong! (ping \"Example\" took 275 ms to arrive)",
"format": "org.matrix.custom.html",
"formatted_body": "<a href='https://matrix.to/#/@tom:doctoruwu.uk'>@tom:doctoruwu.uk</a>: Pong! (<a href='https://matrix.to/#/!ping-no-synapse:maunium.net/$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k'>ping</a> \"Example\" took 275 ms to arrive)",
"m.relates_to": {
"event_id": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k",
"from": "doctoruwu.uk",
"ms": 275,
"rel_type": "xyz.maubot.pong"
},
"msgtype": "m.notice",
"pong": {
"from": "doctoruwu.uk",
"ms": 275,
"ping": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k"
}
},
"depth": 33763,
"hashes": {
"sha256": "QiRLY9/dH/QZXAxwnUFKsKsqwjv5vifzJqJ2JacP0K4"
},
"origin": "spritsail.io",
"origin_server_ts": 1709722417057,
"prev_events": [
"$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k"
],
"room_id": "!ping-no-synapse:maunium.net",
"sender": "@echo:spritsail.io",
"signatures": {
"spritsail.io": {
"ed25519:4cL10W6x": "S3ETw2iRZYqyh4MG5Id3vAF1muF6xmGQoJ6XFDHS5ZJzwt/qiAcVfZrmxmn7QlfPZoMgen/TWWjuUPIVAQeWDQ"
}
},
"type": "m.room.message"
}
"m.relates_to": {
"event_id": "$f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k",
"from": "doctoruwu.uk",
"ms": 275,
"rel_type": "xyz.maubot.pong"
},
This is likely causing the issue
Ah, I see. Is the problem that @tulir's echo plugin uses custom relations, or is the problem that Element Web filters them out?
Neither, its a bug in some assumptions the matrix-js-sdk makes
The spec says the unknown relation should be ignored
Relationships which don’t match the schema, or which break the rules of a relationship, are simply ignored.
The code today considers the relation and tries to find the parent event - $f618YeJnv6DWI7eRTN9ysOX7324mp4V7Hju5K8h095k
in the case of your example - and see which timeline that event fits in. Are you able to get the source for that event too?
This should be it:
{
"auth_events": [
"$fJMpHekmhxm920T0QYIb-uoiAhkYd0y_cDN260N7m2o",
"$V1MtR_rtFySFmgRJydfrcWugbd6MTkoi3fdS7pv65_o",
"$vYTZkTQPBTGPdzHtCpgo-Qz7dZ9uh5mbkqXPB4teN_s"
],
"content": {
"body": "!ping Example",
"m.mentions": {},
"msgtype": "m.text"
},
"depth": 33762,
"hashes": {
"sha256": "8NTX2wrlICbf/jca+DBulP9vBLiR6eqJJE5R/w/MqmA"
},
"origin": "doctoruwu.uk",
"origin_server_ts": 1709722416757,
"prev_events": [
"$TiVPb7UurJVqspFMQRf69TufHj9svGp9K4YKdvTUasw"
],
"room_id": "!ping-no-synapse:maunium.net",
"sender": "@tom:doctoruwu.uk",
"signatures": {
"doctoruwu.uk": {
"ed25519:wl95vzHo": "UVTgxmdwuzLhH9bQvYEZfiNO66RlW8gzfTqZaEljky5PpebF8C+e01iHKOk6dSEWneWoGZNh7UaMWYKEIE9MCA"
}
},
"type": "m.room.message",
"unsigned": {}
}
So I guess the issue here boils down to that parent event not being available to the client at the time the relation is loaded and the code isn't asynchonously loading the parent of every event with a relation as this is not needed for all specced relation types (e.g. threads, replies, edits, reactions) as this would increase server load substantially. Definitely something that needs fixing but unlikely to get any attention from the team given the edge case.
EA not having this issue is because it dumps all events in the main timeline by default, but this is a common cause of stuck notifications (in cases other than this one)
I see, so do you think this is a problem with Conduit not providing events in the correct order when doing a full sync?
I agree it'd be nice if Element Web didn't have this problem, but if the problem is that the server's not providing events in the correct order (so the parent arrives after the child) then I can look at possible patches on the server side to mitigate this in the meantime.
I see, so do you think this is a problem with Conduit not providing events in the correct order when doing a full sync?
No, more likely the event is just outside the page of events provided in the sync
Ideally this would be solved by the parent event being provided via unsigned so the client always knew the parent event but that would need spec work
What other information would you think might be helpful here? I'd like this resolved as well.
What usecases warrant these non-standard relations for better informed triage
The m.notice
events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.
It sounds like "fixing" this would require quite a lot of work, but would it be feasible to offer a /devtools
or Labs option to specify a different limit, so the initial sync can pull back more events if the server permits it?
I just tested this working by overriding the limit value to 50 in Nginx so it pulls back enough events to be able to resolve the relation:
# Specific match for sync request with "limit=20"
location ~ ^/_matrix/client/v3/sync\?(.*)(limit=)20(.*)&_cacheBuster=(.*)$ {
set $args $1$2$350$4; # Modify limit value to 50
proxy_pass http://doctoruwu_conduwuit/_matrix/client/v3/sync?$args;
}
location /_matrix/ {
proxy_pass http://doctoruwu_conduwuit;
}
With this in place, Element Web now shows all of the missing events, but this workaround is just a pretty messy hack to demonstrate that it would resolve it.
The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.
It has nothing to do with them being m.notice
events. It is entirely to do with them using a non-standard relation.
but would it be feasible to offer a /devtools or Labs option to specify a different limit, so the initial sync can pull back more events if the server permits it?
The limit is configurable via config.json only
The m.notice events are used frequently by bots, which is why it's very prevalent in the ping rooms I used in my examples above.
It has nothing to do with them being
m.notice
events. It is entirely to do with them using a non-standard relation.
Sorry, I misspoke. It seems these types of relations are used whenever Maubot replies to a user, and Maubot is one of the most commonly used bot frameworks in Matrix, so while I don't have specific numbers, it would be quite common.
The limit is configurable via config.json only
Excellent. I'll look at running my own build to mitigate this issue for now, thanks.
Do you know why maubot doesn't use actual replies as would be semantically accurate?
They're pongs, not replies. Only used by echo bots mostly in ping rooms though, not any other bots. Rendering replies would take even more space in the UI and there's no reason for them to be replies
Steps to reproduce
When closing Element Web (I've tried app.element.io and develop.element.io) or clearing cache, the
m.notice
messages seem to disappear.Here is a screenshot of an example ping test I just ran in the #ping-no-synapse:maunium.net room:
Then how the room looks after clearing the cache in client to perform an initial sync:
I've also tried using /devtools to show all hidden messages and there's simply no trace of these missing
m.notice
messages. I'm confident it's not a server issue as this behaviour doesn't occur in Element Android or other clients.Outcome
Expected outcome is that rooms would appear roughly the same after restarting client or clearing cache as they did beforehand.
Operating system
Microsoft Windows 11 Pro 23H2 (22631.3235)
Browser information
Microsoft Edge 122.0.2365.66 (Official build) (64-bit)
URL for webapp
develop.element.io
Application version
Element version: f01d69f90b98-react-942fabc5a8fe-js-7fee37680f33 Crypto version: Rust SDK 0.7.0 (b1918e9), Vodozemac 0.5.1
Homeserver
doctoruwu.uk
Will you send logs?
Yes