alphapapa / ement.el

A Matrix client for GNU Emacs
GNU General Public License v3.0
488 stars 44 forks source link

Expired access tokens require manual intervention (e.g. "soft_logout") #173

Open phil-s opened 1 year ago

phil-s commented 1 year ago

With session saving in play, after this happens:

error in process sentinel: ement-api-error: Ement API error: "401: Access token has expired"
error in process sentinel: Ement API error: "401: Access token has expired"

It appears to be necessary to (after disconnecting) manually edit the saved session file and remove the invalid entry before attempting to reconnect.

Without doing that, running ement-disconnect and then ement-connect results in a HTTP 401 error due to the invalid token, and it does not attempt to reauthenticate and nor does it purge the invalid session from the file, so trying again will simply produce the same result:

Ement: Sync request sent, waiting for response... error in process sentinel: Ement API error: "Ement: Unrecognized network error", #s(plz-error nil #s(plz-response 2 401 ((server . "nginx") (date . "Wed, 21 Jun 2023 11:31:39 GMT") (content-type . "application/json") (cache-control . "no-cache, no-store, must-revalidate") (access-control-allow-origin . "*") (access-control-allow-methods . "GET, HEAD, POST, PUT, DELETE, OPTIONS") (access-control-allow-headers . "X-Requested-With, Content-Type, Authorization, Date") (strict-transport-security . "max-age=63072000")) "{\"errcode\":\"M_UNKNOWN_TOKEN\",\"error\":\"Access token has expired\",\"soft_logout\":true}") nil) [2 times]

alphapapa commented 1 year ago

Hm, you're the first user to report having a token be expired. I saw an addition to the Matrix spec allowing for that, but I didn't know it was actually happening in the wild. Thanks for reporting.

phil-s commented 1 year ago

An ement-reconnect command would be great here, actually, if it's possible to retain all of the session buffers, connect the new session, and have the buffers associated with the new session without anything being killed in the interim?

alphapapa commented 1 year ago

See:

This will require changes in several places.

alphapapa commented 1 year ago

@phil-s Please try this branch: https://github.com/alphapapa/ement.el/tree/wip/soft-logout When the server does a soft-logout, Ement should tell you to run M-x ement-refresh, which should run the login procedure again.

Once that is verified to work, I'll see about adding support for the refresh token.

phil-s commented 1 year ago

Thank you; I didn't get to this on the weekend, but expect to be able to test it this week.

phil-s commented 1 year ago

I don't have time to debug this right now I'm afraid, but this isn't working for me at this point. I'm using the new branch with plz 0.7.

When the session expired and I attempted to send a message, I got:

Error running timer ‘plz--respond’: (ement-api-error "401: Access token has expired")
Debugger entered--Lisp error: (ement-api-error "401: Access token has expired")
  signal(ement-api-error ("401: Access token has expired"))
  ement-api-error(#s(plz-error :curl-error nil :response #s(plz-response :version 2 :status 401 :headers ((server . "nginx") (date . "Tue, 11 Jul 2023 02:59:20 GMT") (content-type . "application/json") (cache-control . "no-cache, no-store, must-revalidate") (access-control-allow-origin . "*") (access-control-allow-methods . "GET, HEAD, POST, PUT, DELETE, OPTIONS") (access-control-allow-headers . "X-Requested-With, Content-Type, Authorization, Date") (strict-transport-security . "max-age=63072000")) :body "{\"errcode\":\"M_UNKNOWN_TOKEN\",\"error\":\"Access token has expired\",\"soft_logout\":true}") :message nil))
  plz--respond(#<process plz-request-curl> #<buffer  *plz-request-curl*> "finished\n")
  apply(plz--respond (#<process plz-request-curl> #<buffer  *plz-request-curl*> "finished\n"))
  timer-event-handler([t 25772 50568 355007 nil plz--respond (#<process plz-request-curl> #<buffer  *plz-request-curl*> "finished\n") nil 168000 nil])

After looking at the new commits I saw ement-refresh which I tried manually. It asked me for my username and auth method as per a normal ement-connect, and the web browser auth process worked (and I notice the browser now displays a success message; cheers).

I then got:

Ement: Sync request sent.  Waiting for response...
Ement: Response arrived after 19.34 seconds.  Reading 1.8M JSON response...
Ement: Reading JSON took 0.70 seconds
Ement: Reading events... 
Error running timer ‘plz--respond’: (wrong-type-argument hash-table-p nil)

I've redacted most of the data from the very large backtrace for that one:

Debugger entered--Lisp error: (wrong-type-argument hash-table-p nil)
  puthash("$BJrtL6nZ0VvHughEgvOULWq9Gi-4PBAiVw5AwHdvnTs" #s(ement-event ...) nil)
  ement--put-event(#s(ement-event ...) #s(ement-room ...) #s(ement-session ...))
  run-hook-with-args(ement--put-event #s(ement-event ...) #s(ement-room ...) #s(ement-session ...))
  ement--push-joined-room-events(#s(ement-session :user #s(ement-user ...) :server #s(ement-server ...) :token ... :transaction-id nil :rooms (#s(ement-room ...)) :next-batch nil :device-id ... :initial-device-display-name ... :has-synced-p nil :account-data ... :events nil) ... leave)
  ement--push-left-room-events ...
  apply(ement--push-left-room-events ...)
  #f(compiled-function (&rest args2) #<bytecode -0xae4c6111d95045d>)(...)
  mapc(#f(compiled-function (&rest args2) #<bytecode -0xae4c6111d95045d>) (... ...))
  ement--sync-callback(#s(ement-session ...) (...))
  apply(ement--sync-callback (#s(ement-session ...) (...)))
  #f(compiled-function (&rest args2) #<bytecode -0xae4c610b965045d>)(((next_batch . ...) (account_data (events . ...)) (presence (events . ...)) (device_one_time_keys_count (signed_curve25519 . 0)) (org\.matrix\.msc2732\.device_unused_fallback_key_types . []) (rooms ...)))
  #f(compiled-function () #<bytecode 0xedd5189f120094c>)()
  plz--respond(#<process plz-request-curl> #<buffer  *plz-request-curl*-323871> "finished\n")
  apply(plz--respond (#<process plz-request-curl> #<buffer  *plz-request-curl*-323871> "finished\n"))
  timer-event-handler([t 25772 51397 67298 nil plz--respond (#<process plz-request-curl> #<buffer  *plz-request-curl*-323871> "finished\n") nil 168000 nil])

That's all I have right now, and I'm about to restart the session manually, so I won't be able to continue from the above state later. (I may be in a position to repeat this on our staging Matrix instance, though.)

alphapapa commented 1 year ago

@phil-s Sorry, just a simple oversight on my part. Please pull the branch again and let me know if it works next time.

phil-s commented 1 year ago

Will do. I've disconnected and restarted Emacs to make sure I'm testing from a sane state. I could test the ement-refresh command earlier, but I won't get a session time-out for about a week (unless there's a way to tell the server to invalidate the current session on demand).

Ooh. Ok, ement got into a loop trying to connect. In the messages buffer I see:

Ement: Checking server’s login flows...
Ement: Sync request sent.  Waiting for response...
Ement: Response arrived after 11.99 seconds.  Reading 1.8M JSON response...
Ement: Reading JSON took 0.55 seconds
Ement: Reading events... 
Ement: Sync request sent.  Waiting for response...
Ement: Response arrived after 0.66 seconds.  Reading 739 JSON response...
Ement: Reading JSON took 0.00 seconds
Ement: Reading events... 
Ement: Sync request sent.  Waiting for response...
error in process sentinel: persist-load: Invalid read syntax: "#", 2, 297335
error in process sentinel: Invalid read syntax: "#", 2, 297335
Ement: Response arrived after 2.11 seconds.  Reading 718 JSON response...
Ement: Reading JSON took 0.00 seconds
Ement: Reading events... 
Ement: Sync request sent.  Waiting for response...
error in process sentinel: persist-load: Invalid read syntax: "#", 2, 297335
error in process sentinel: Invalid read syntax: "#", 2, 297335
Ement: Response arrived after 1.55 seconds.  Reading 736 JSON response...
Ement: Reading JSON took 0.00 seconds
Ement: Reading events... 
...

...repeatedly until I ran M-x ement-disconnect.

phil-s commented 1 year ago

...and that could have been a blunder on my part with the plz version, as I think it might have been seeing an earlier version after the restart. I've now made certain that it'll be pointing at plz 0.7 after restarting, and connecting worked just now.

alphapapa commented 1 year ago

@phil-s Hm, that's weird. The error from persist-load is surprising, because I haven't changed anything that should affect the one variable we use persist for.

Anyway, I just pushed a few more changes/fixes, so please pull again. If you get any similar errors, please ement-disconnect, then eval (persist-reset 'ement-room-list-visibility-cache), then ement-connect again.

Thanks for your help.

alphapapa commented 1 year ago

And, as you mentioned:

I won't get a session time-out for about a week (unless there's a way to tell the server to invalidate the current session on demand).

I don't know of such a way, but please let me know if you discover one. :)

phil-s commented 1 year ago

Another issue related to ement-refresh, which must be (cl-incf (ement-session-transaction-id session)) not having a number to work with. I see :transaction-id nil in the (massive) session argument in the debugger.

Debugger entered--Lisp error: (wrong-type-argument number-or-marker-p nil)
  ement--update-transaction-id(#s(ement-session
  ement-send-message(#s(ement-room
  ement-room-send-message(#s(ement-room
  funcall-interactively(ement-room-send-message
  call-interactively(ement-room-send-message
  command-execute(ement-room-send-message)

I was getting that consistently trying to post a message in a room after the refresh/reconnect. I then managed to open a second buffer for the same room from the rooms list, and that one allowed me to send messages.

A bit later I managed to trigger this:

Debugger entered--Lisp error: (wrong-number-of-arguments (2 . 2) 1)
  ement-api-error(#s(plz-error :curl-error (7 . "Failed to connect to host.") :response nil :message nil))
  #f(compiled-function (process code) #<bytecode -0x31a22361dacad6c>)(#<process plz-request-curl<3>> "7")
  plz--respond(#<process plz-request-curl<3>> #<buffer  *plz-request-curl*-145791> "exited abnormally with code 7\n")
  apply(plz--respond (#<process plz-request-curl<3>> #<buffer  *plz-request-curl*-145791> "exited abnormally with code 7\n"))
  timer-event-handler([t 25774 36683 597853 nil plz--respond (#<process plz-request-curl<3>> #<buffer  *plz-request-curl*-145791> "exited abnormally with code 7\n") nil 90000 nil])
alphapapa commented 1 year ago

Another issue related to ement-refresh, which must be (cl-incf (ement-session-transaction-id session)) not having a number to work with.

Thanks, another simple oversight. I should move that slot's initial value into the struct definition like I did for the events table.

phil-s commented 1 year ago

I'm hoping to take a look at this again later in the week, so any pointers for debugging the reconnection process would be gratefully accepted. (Last time I tested I believe I was still ending up needing to forcibly disconnect and reconnect to get the client working again, so I'm pretty sure more work is needed here.)

alphapapa commented 1 year ago

I'm not sure exactly what you mean by "reconnection process," since Matrix is constantly opening new connections for receiving new events. (Forgive my being pedantic, but as I'm sure you can testify to, being specific early on can save time from miscommunication later on.) For the, shall we say, reauthentication or relogin process, my best general advice is to use Edebug on the ement-connect function. If you have more specific questions about how to debug certain things, please let me know.

phil-s commented 1 year ago

as I'm sure you can testify to, being specific early on can save time from miscommunication later on

Yes indeed. I'm afraid my lack of familiarity with the protocols is going to make me unintentionally ambiguous at times, so my apologies for that. (I'm still kinda hoping to get away without having to learn how Matrix works in detail.)

alphapapa commented 1 year ago

No problem. I don't blame you for not wanting to wade in too deep. :) As generally pleasant and simple as Matrix's basic design is, it's still growing and changing, which can be tedious to keep up with.

alphapapa commented 1 year ago

Retargeting for v0.14.