element-hq / element-meta

Shared/meta documentation and project artefacts for Element clients
65 stars 11 forks source link

UTDs: maubot/mautrix bridges fail to encrypt for EX sessions #2387

Open kegsay opened 2 months ago

kegsay commented 2 months ago

We have had at least three reports of UTDs when Element X is used in conjuction with mautrix bridges by @ jkhsjdhjs:totally.rip, @ frebib:nerdhouse.io and Will L. This is a placeholder issue to collect more information to see if there is something actionable.

Close this issue if:


In Will L's and jkhsjdhjs's case, it looks like room keys failed to be exchanged correctly, which manifests as the room working fine for a while then suddenly failing to decrypt. For frebib, new rooms are most frequently affected, where the bridge is on a different server to the user seeing the UTD.

WhatsApp bridge is repeatedly the culprit, but that could just be due to its popularity.

kegsay commented 2 months ago

Related https://github.com/element-hq/element-x-ios/issues/2263

jkhsjdhjs commented 2 months ago

Just had this issue again with my mautrix-whatsapp bridge and noticed that the bridge does indeed encrypt the messages for the Element X session (WNZDTMFIRU). However, Element X is still unable to decrypt the message. I attached the relevant log of mautrix-whatsapp and Element X iOS NotificationServiceExtension to this comment in the hopes, that they are useful.

mautrix-whatsapp.log console-nse.2024-04-20-14.log

kegsay commented 2 months ago

Failed to decrypt a non-pre-key message with all available sessions errors_by_olm_session=[("Tz1YW/DjvQyQN+PAVmYwx/TSuUco6kBMF2/GeRoIIO4", InvalidMAC(MacError)), ("guGLQ/9DTgDdxVNSGO+b5I6Bipupxyd4i7hOHsSHmGw", InvalidMAC(MacError)), ("euS08VImu5hKNndPrJJZzkiRoGfofi4WjTLFrawY10k", InvalidMAC(MacError)), ("4a5Ake2e7Dh9nR5NZDTxJnO+d3dPcXZ/w/anrduhcZM", InvalidMAC(MacError)), ("Dodt7wc+cqbzdC9Mbt+y6HDshI5tyeyvg07lzMpezQg", InvalidMAC(MacError)), ("w7CTLS0KAVEyDfDiPOSWcqj1JFW8XHoROcf08DGvET8", InvalidMAC(MacError)), ("wxt7uCPcizq23j6CcfRwBXKjyXiH3UYKZjegVPUTNxo", InvalidMAC(MacError)), ("exGrmYijdvRPngQqM4Ebws91giSr0XiqgVFb8NvjqOc", InvalidMAC(MacError)), ("CZYp4cnV5cplf8uL7CZyVWdc55+CRTJxmFUhATyt4pg", InvalidMAC(MacError)), ("aYYnWCd1WhyMvTVpDCDjnHJhgtO5T2QMwWCyY8OWu8s", InvalidMAC(MacError)), ("jViZtkUHSxs8llj1m0OUSVtREIyO+CxeWiTQfk1LV5M", InvalidMAC(MacError))] | crates/matrix-sdk-crypto/src/olm/account.rs:1245 - this feels like it is https://github.com/matrix-org/matrix-rust-sdk/issues/3110 all over again, in which case https://github.com/matrix-org/matrix-rust-sdk/pull/3338 should fix this on EIX.

@wrjlewis which devices were failing to decrypt WhatsApp messages for you?

jkhsjdhjs commented 2 months ago

I don't understand what it means for an OLM session to "wedge", but I currently work around this issue by removing the outbound sessions for the affected rooms from the mautrix-whatsapp database, i.e.

DELETE FROM crypto_megolm_outbound_session
WHERE room_id IN (
    '!affected_room1:example.com',
    '!affected_room2:example.com',
    ...
)

This forces mautrix-whatsapp to create new sessions for the respective rooms on the next message, which can be decrypted again by EIX. Does this fit the OLM session wedge theory?

Furthermore, going by the wedge theory, shouldn't this issue also occur with messages sent by other E2EE aware parties, like other regular users or other bridges? Shouldn't more users be affected by this?

wrjlewis commented 2 months ago

Failed to decrypt a non-pre-key message with all available sessions errors_by_olm_session=[("Tz1YW/DjvQyQN+PAVmYwx/TSuUco6kBMF2/GeRoIIO4", InvalidMAC(MacError)), ("guGLQ/9DTgDdxVNSGO+b5I6Bipupxyd4i7hOHsSHmGw", InvalidMAC(MacError)), ("euS08VImu5hKNndPrJJZzkiRoGfofi4WjTLFrawY10k", InvalidMAC(MacError)), ("4a5Ake2e7Dh9nR5NZDTxJnO+d3dPcXZ/w/anrduhcZM", InvalidMAC(MacError)), ("Dodt7wc+cqbzdC9Mbt+y6HDshI5tyeyvg07lzMpezQg", InvalidMAC(MacError)), ("w7CTLS0KAVEyDfDiPOSWcqj1JFW8XHoROcf08DGvET8", InvalidMAC(MacError)), ("wxt7uCPcizq23j6CcfRwBXKjyXiH3UYKZjegVPUTNxo", InvalidMAC(MacError)), ("exGrmYijdvRPngQqM4Ebws91giSr0XiqgVFb8NvjqOc", InvalidMAC(MacError)), ("CZYp4cnV5cplf8uL7CZyVWdc55+CRTJxmFUhATyt4pg", InvalidMAC(MacError)), ("aYYnWCd1WhyMvTVpDCDjnHJhgtO5T2QMwWCyY8OWu8s", InvalidMAC(MacError)), ("jViZtkUHSxs8llj1m0OUSVtREIyO+CxeWiTQfk1LV5M", InvalidMAC(MacError))] | crates/matrix-sdk-crypto/src/olm/account.rs:1245 - this feels like it is matrix-org/matrix-rust-sdk#3110 all over again, in which case matrix-org/matrix-rust-sdk#3338 should fix this on EIX.

@wrjlewis which devices were failing to decrypt WhatsApp messages for you?

Is it the device IDs you need?

richvdh commented 2 months ago

@wrjlewis as a first step could you confirm if it's Element X iOS or anther client that is having the problem?

richvdh commented 2 months ago

@jkhsjdhjs :

I don't understand what it means for an OLM session to "wedge", but I currently work around this issue by removing the outbound sessions for the affected rooms from the mautrix-whatsapp database, i.e....

This forces mautrix-whatsapp to create new sessions for the respective rooms on the next message, which can be decrypted again by EIX. Does this fit the OLM session wedge theory?

Yes. "Olm", not OLM, by the way: https://gitlab.matrix.org/matrix-org/olm/blob/master/docs/olm.md

Furthermore, going by the wedge theory, shouldn't this issue also occur with messages sent by other E2EE aware parties, like other regular users or other bridges? Shouldn't more users be affected by this?

Well, I think lots of users are affected by this. It's possible that other clients are better at covering it up by using a new olm session than the bridges.

wrjlewis commented 2 months ago

@wrjlewis as a first step could you confirm if it's Element X iOS or anther client that is having the problem?

As yes, it's always just on EX iOS for me. I have FluffyChat and Element iOS clients as well which did not present the issue.

kegsay commented 2 months ago

This feels like it may already be fixed then. We'll need to wait until https://github.com/matrix-org/matrix-rust-sdk/pull/3338 lands in a proper release which people can test.

kegsay commented 1 month ago

Should land Monday.

kegsay commented 3 weeks ago

This has been rolled out to Element X for a while now. If anyone does see mautrix bridge problems we need bug reports.

Will close this issue in July if there are no mautrix bridge problems.