alphapapa / ement.el

A Matrix client for GNU Emacs
GNU General Public License v3.0
478 stars 44 forks source link

Transaction ID reuse warning #36

Closed polaris64 closed 2 years ago

polaris64 commented 2 years ago

Hi there,

First of all, thanks so much for your work! It's great to have a Matrix chat client in Emacs!

I've been using Ement.el for the past week, mainly to communicate in E2EE rooms via Pantalaimon. I seem to be getting the following warning quite frequently: -

Warning (ement-room-send-event-callback): Event ID "${event-id}" already seen in session "@{my-username}".
This may indicate a reused transaction ID, which likely means that the event was not sent to the room ("{room-name}").
You may need to disconnect, delete the `ement-session-file', and connect again to start a new session.
Event content: (("format" . "org.matrix.custom.html") ("formatted_body" . "<p>
{message-content}</p>") ("msgtype" . "m.text") ("body" . "{message-content}"))

(I've replaced some actual details with identifiers in braces, e.g. {event-id}).

The first time this appears it seems to delay my message from being sent, but it does get sent after a few seconds. However the second time prevents the message from being sent at all. If I disconnect (without deleting ement-session-file as I cannot find this file stored anywhere) and reconnect then it all seems to be fine again, but just for a while.

alphapapa commented 2 years ago

Hi,

Thanks for the kind words. I'm glad it's useful to you.

I pushed an update to the warning message to help clarify what's going on, and to update the variable name. I should probably also update the transaction IDs to initialize them with the current timestamp, which I was told is what Element does, which should help prevent the reuse problem.

However, if there's some kind of unexpected interaction causing this to happen more frequently when using Pantalaimon, I don't know what's going on there.

polaris64 commented 2 years ago

Hi, thanks for looking into this and for clarifying the warning message.

As an update, I've noticed what seems to cause this. I am trying to use Ement.el as much as possible, but for things like sending file attachments I'm jumping back to the Element web app. I noticed that after I send a message in a room on Element, the next time I send a message in Ement.el it'll come up with this warning.

alphapapa commented 2 years ago

Well, given that we already have a send-image command, a send-file one will be easy to add. I just haven't needed it yet, myself, so I haven't bothered to add it.

I can't see how those two events could be related to this problem. Ement.el and Element should not be using the same access token (unless you manually copied the token from one client to the other, which surely you didn't), and transaction IDs are scoped to the tokens, so it shouldn't be possible for the two clients to interfere with each other. Are you sure that it's not just a coincidence?

The only alternative I can think of would be if Pantalaimon were doing something weird with tokens or transaction IDs. But other users have been using it without reporting such issues, and it seems unlikely that it would do something like that.

polaris64 commented 2 years ago

In that case then it could have just been a coincidence. I'll keep investigating and I'll let you know if I discover something more concrete.

mrmekon commented 2 years ago

I was getting this frequently as well. I'm not an expert on Matrix by any means, so take this with a grain of salt:

It looks like my Pantalaimon session maintains a persistent device ID to the homeserver, but Ement.el generates a new device ID every time it connects to the proxy. This is presumably because ement-save-sessions is nil, and enabling that might fix it.

The result is that the session's transaction-id resets to 0 on every reconnect. Ement.el sends messages to the URL https://.../m.room.message/<transaction id>. Once transaction-id rolls back to zero, someone along the chain (I'm not sure if it's Pantalaimon or the homeserver) detects duplicate transactions from the same device and rejects them. Ement.el increments transaction-id even if it fails to send, so if you send enough messages you will eventually get a unique transaction-id and they will start going through again.

Even if saving the session works, I personally want the ability to blow away the cached session file and restart, so I've patched my local copy to include Ement.el's current device-id along with the transaction-id in the post URLs. I don't know if that's even legal by the API's specs, but it seems to work.

Here's the (almost entirely untested) patch I'm using:

From 5e248ce16e61caaacc8cdf98b5602f9724d8bf1b Mon Sep 17 00:00:00 2001
From: Trevor Bentley <trevor@trevorbentley.com>
Date: Fri, 24 Sep 2021 17:35:54 +0200
Subject: [PATCH] use device-id in conjunction with transaction-id

---
 ement-room.el | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/ement-room.el b/ement-room.el
index 84b39cb..821c97c 100644
--- a/ement-room.el
+++ b/ement-room.el
@@ -818,8 +818,10 @@ Interactively, set the current buffer's ROOM's TOPIC."
                 (message "Uploaded file %S.  Sending message..." file)
                 (pcase-let* (((map ('content_uri content-uri)) data)
                              ((cl-struct ement-room (id room-id)) room)
-                             (endpoint (format "rooms/%s/send/%s/%s" (url-hexify-string room-id)
-                                               "m.room.message" (cl-incf (ement-session-transaction-id session))))
+                             (endpoint (format "rooms/%s/send/%s/%s-%s" (url-hexify-string room-id)
+                                               "m.room.message"
+                                               (url-hexify-string (ement-session-device-id session))
+                                               (cl-incf (ement-session-transaction-id session))))
                              ;; TODO: Image height/width (maybe not easy to get in Emacs).
                              (content (ement-alist "msgtype" "m.image"
                                                    "url" content-uri
@@ -1153,7 +1155,8 @@ the content. (e.g. see `ement-room-send-org-filter')."
   (cl-assert (or (not formatted-body) (not (string-empty-p formatted-body))))
   (pcase-let* (((cl-struct ement-room (id room-id) (local (map buffer))) room)
                (window (when buffer (get-buffer-window buffer)))
-               (endpoint (format "rooms/%s/send/m.room.message/%s" (url-hexify-string room-id)
+               (endpoint (format "rooms/%s/send/m.room.message/%s-%s" (url-hexify-string room-id)
+                                 (url-hexify-string (ement-session-device-id session))
                                  (cl-incf (ement-session-transaction-id session))))
                (content (ement-aprog1
                             (ement-alist "msgtype" "m.text"
@@ -1198,7 +1201,8 @@ the content. (e.g. see `ement-room-send-org-filter')."
   (cl-assert (not (string-empty-p body)))
   (pcase-let* (((cl-struct ement-room (id room-id) (local (map buffer))) room)
                (window (when buffer (get-buffer-window buffer)))
-               (endpoint (format "rooms/%s/send/m.room.message/%s" (url-hexify-string room-id)
+               (endpoint (format "rooms/%s/send/m.room.message/%s-%s" (url-hexify-string room-id)
+                                 (url-hexify-string (ement-session-device-id session))
                                  (cl-incf (ement-session-transaction-id session))))
                (content (ement-aprog1
                             (ement-alist "msgtype" "m.emote"
@@ -1278,8 +1282,10 @@ The message must be one sent by the local user."
                          (user-error "To delete a message, use command `ement-room-delete-message'"))
                        (when (yes-or-no-p (format "Edit message to: %S? " body))
                          (list event ement-room ement-session body)))))))
-  (let* ((endpoint (format "rooms/%s/send/%s/%s" (url-hexify-string (ement-room-id room))
-                           "m.room.message" (cl-incf (ement-session-transaction-id session))))
+  (let* ((endpoint (format "rooms/%s/send/%s/%s-%s" (url-hexify-string (ement-room-id room))
+                           "m.room.message"
+                           (url-hexify-string (ement-session-device-id session))
+                           (cl-incf (ement-session-transaction-id session))))
          (new-content (ement-alist "body" body
                                    "msgtype" "m.text"))
          (_ (when ement-room-send-message-filter
@@ -1307,8 +1313,9 @@ The message must be one sent by the local user."
                    (user-error "Message not deleted"))))
   (pcase-let* (((cl-struct ement-event (id event-id)) event)
                ((cl-struct ement-room (id room-id)) room)
-               (endpoint (format "rooms/%s/redact/%s/%s"
+               (endpoint (format "rooms/%s/redact/%s/%s-%s"
                                  (url-hexify-string room-id) (url-hexify-string event-id)
+                                 (url-hexify-string (ement-session-device-id ement-session))
                                  (cl-incf (ement-session-transaction-id ement-session))))
                (content (if reason
                             (ement-alist "reason" reason)
@@ -1373,7 +1380,8 @@ reaction string, e.g. \"👍\"."
                  ;; hl-line-mode is enabled, it only returns the hl-line face.
                  ((cl-struct ement-event (id event-id)) event)
                  ((cl-struct ement-room (id room-id)) ement-room)
-                 (endpoint (format "rooms/%s/send/m.reaction/%s" (url-hexify-string room-id)
+                 (endpoint (format "rooms/%s/send/m.reaction/%s-%s" (url-hexify-string room-id)
+                                   (url-hexify-string (ement-session-device-id ement-session))
                                    (cl-incf (ement-session-transaction-id ement-session))))
                  (content (ement-alist "m.relates_to"
                                        (ement-alist "rel_type" "m.annotation"
-- 
2.32.0
polaris64 commented 2 years ago

Hi @mrmekon, that's very interesting, thanks for the detailed explanation.

I've been using Ement.el fairly regularly and I still get this issue occasionally, however now it doesn't seem to prevent the message from being sent. I do however sometimes have problems where a message continually fails to send and there's no warning. I haven't raised an issue yet as I haven't had time to debug it.

I'll give your patch a try to see if that helps!

alphapapa commented 2 years ago

Hi Trevor,

Looks like you're going to make me pull out the spec and dig into device IDs. :) Well, I was going to have to do that at some point. Thanks.

As @treed (I think it was Ted) mentioned to me, it might be good to use the current timestamp in the transaction IDs, which would probably make these conflicts nearly impossible. (The Matrix spec doesn't seem to give much guidance regarding what clients should use for transaction IDs, so I started with an integer from 1, but that can occasionally lead to issues like this.)

@polaris64

I do however sometimes have problems where a message continually fails to send and there's no warning.

That's probably the transaction ID issue. Using the timestamp in the ID will probably solve that.

treed commented 2 years ago

Yeah, I decided to see how Element handles this case and found that date bit in the SDK code. It seems a reasonable enough fix to me.

alphapapa commented 2 years ago

Ok, https://github.com/alphapapa/ement.el/commit/2763bd6837d7338e3e76179d89db33eef2a4acdb initializes a new session's transaction ID to a large, pseudorandom integer. It should probably prevent any issues with Pantalaimon as @mrmekon described. You can either disconnect/reconnect to initialize a new session, or you can manually set your session's transaction ID to any very large integer by evaluating the appropriate Lisp expression.

The warnings may still happen occasionally when the server responds out-of-order to sending a new event; I'm not sure of the best way to solve that yet. But I could probably just disable the warning by default now; it's mostly served its purpose by bringing this issue to our attention.

polaris64 commented 2 years ago

Just a quick update: I pulled these changes and have been using Ement.el a lot since then. I still get the warnings quite frequently (perhaps once every 20 messages on average) but as yet nothing has failed to send.

So it certainly looks like the situation has improved.

alphapapa commented 2 years ago

@polaris64 Are you using the matrix.org homeserver or a different one?

polaris64 commented 2 years ago

@alphapapa, yes the matrix.org homeserver, encrypted rooms via Pantalaimon

alphapapa commented 2 years ago

@polaris64 Thanks. If you have the inclination, I'd be interested to know if you get these warnings when you connect without Pantalaimon and use non-encrypted rooms. It'd be helpful to know if the problem is isolated to or exacerbated by Pantalaimon.

polaris64 commented 2 years ago

@alphapapa sure, I'll give that a try and I'll let you know what happens. You may see me on the Ement.el chat room collecting test data in that case :)