element-hq / matrix-authentication-service

https://element-hq.github.io/matrix-authentication-service/
GNU Affero General Public License v3.0
27 stars 6 forks source link

M_NOT_FOUND from synapse every time a new user tries to sign in #3354

Open thefinn93 opened 1 week ago

thefinn93 commented 1 week ago

Describe the bug On our MAS instance, every time we attempt to onboard a new user, they are prevented from completing login and shown error request failed with status 404 Not Found: M_NOT_FOUND: Unknown user. A sample of the MAS logs when this happens:

2024-10-09T19:19:13.339723Z  INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 2 activity records to the database
2024-10-09T19:20:13.347047Z  INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 1 activity records to the database
2024-10-09T19:20:50.876417Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBK4850FS9RTKXBYEX7JD4" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBK4850FS9RTKXBYEX7JD4}:homeserver.create_device{matrix.homeserver="devhack.net" matrix.mxid="@xxxx:devhack.net" matrix.device_id="LorCzfWcDe"}: mas_matrix_synapse: crates/matrix-synapse/src/lib.rs:345: error=Failed to create device in Synapse

Caused by:
    request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
2024-10-09T19:20:50.876491Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBK4850FS9RTKXBYEX7JD4" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBK4850FS9RTKXBYEX7JD4}: mas_handlers::compat::login_sso_complete: crates/handlers/src/compat/login_sso_complete.rs:117: error=Internal error: Failed to provision device (Failed to provision device

Caused by:
    0: Failed to create device in Synapse
    1: request failed with status 404 Not Found: M_NOT_FOUND: Unknown user)
2024-10-09T19:21:13.351558Z  INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 1 activity records to the database
2024-10-09T19:21:24.690731Z ERROR http.server.request{otel.kind="server" otel.name="POST /upstream/link/:link_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/upstream/link/01J9SBK5JD1WXZP5JMA1XHZM9T" url.scheme="http" http.route="/upstream/link/:link_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.upstream_oauth2.link.post{upstream_oauth_link.id=01J9SBK5JD1WXZP5JMA1XHZM9T}: mas_handlers::upstream_oauth2::link: crates/handlers/src/upstream_oauth2/link.rs:469: error=Missing session cookie
2024-10-09T19:21:27.795058Z ERROR http.server.request{otel.kind="server" otel.name="GET /upstream/link/:link_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="GET" url.path="/upstream/link/01J9SBK5JD1WXZP5JMA1XHZM9T" url.scheme="http" http.route="/upstream/link/:link_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.upstream_oauth2.link.get{upstream_oauth_link.id=01J9SBK5JD1WXZP5JMA1XHZM9T}: mas_handlers::upstream_oauth2::link: crates/handlers/src/upstream_oauth2/link.rs:182: error=Missing session cookie
2024-10-09T19:21:38.304873Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBNDEA00KYXQKPV53857EF" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBNDEA00KYXQKPV53857EF}:homeserver.create_device{matrix.homeserver="devhack.net" matrix.mxid="@xxx:devhack.net" matrix.device_id="hYQWiVuKGg"}: mas_matrix_synapse: crates/matrix-synapse/src/lib.rs:345: error=Failed to create device in Synapse

Caused by:
    request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
2024-10-09T19:21:38.304945Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBNDEA00KYXQKPV53857EF" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBNDEA00KYXQKPV53857EF}: mas_handlers::compat::login_sso_complete: crates/handlers/src/compat/login_sso_complete.rs:117: error=Internal error: Failed to provision device (Failed to provision device

Caused by:
    0: Failed to create device in Synapse
    1: request failed with status 404 Not Found: M_NOT_FOUND: Unknown user)
2024-10-09T19:22:08.904046Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBPEFCDVKEZX7CGPTW0A46" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBPEFCDVKEZX7CGPTW0A46}:homeserver.create_device{matrix.homeserver="devhack.net" matrix.mxid="@xxx:devhack.net" matrix.device_id="CJirStFj8Q"}: mas_matrix_synapse: crates/matrix-synapse/src/lib.rs:345: error=Failed to create device in Synapse

Caused by:
    request failed with status 404 Not Found: M_NOT_FOUND: Unknown user
2024-10-09T19:22:08.904114Z ERROR http.server.request{otel.kind="server" otel.name="POST /complete-compat-sso/:grant_id" network.protocol.name="http" network.protocol.version="1.1" http.request.method="POST" url.path="/complete-compat-sso/01J9SBPEFCDVKEZX7CGPTW0A46" url.scheme="http" http.route="/complete-compat-sso/:grant_id" user_agent.original="Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/129.0.0.0 Mobile Safari/537.36"}:handlers.compat.login_sso_complete.post{compat_sso_login.id=01J9SBPEFCDVKEZX7CGPTW0A46}: mas_handlers::compat::login_sso_complete: crates/handlers/src/compat/login_sso_complete.rs:117: error=Internal error: Failed to provision device (Failed to provision device

Caused by:
    0: Failed to create device in Synapse
    1: request failed with status 404 Not Found: M_NOT_FOUND: Unknown user)
2024-10-09T19:22:13.360865Z  INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 2 activity records to the database
2024-10-09T19:23:00.007733Z  INFO job.run{otel.kind="consumer" job.id=JID-01J9SBR3H0XHFN50P04ZX7P87M job.attempts=0 job.name="cleanup-expired-tokens"}: mas_tasks::database: crates/tasks/src/database.rs:62: cleaned up expired tokens count=1
2024-10-09T19:23:13.368808Z  INFO activity_tracker.flush: mas_handlers::activity_tracker::worker: crates/handlers/src/activity_tracker/worker.rs:219: Flushing 1 activity records to the database

To Reproduce Steps to reproduce the behavior:

  1. Have an account on our Keycloak instance created.
  2. Attempt to authenticate to our matrix server, browser is sent to MAS which sends you to Keycloak
  3. Authenticate to Keycloak, get sent back to MAS.
  4. Get error screen.

Expected behavior User should be able to login to matrix normally.

Screenshots image

Desktop (please complete the following information): This seems to happen to people on many platforms.

Smartphone (please complete the following information): This seems to happen to people on many platforms.

Additional context

davidmehren commented 2 days ago

We had similar errors, but resolved them by restarting MAS and running provision-all-users.

The whole thing felt a bit buggy for such a central authentication software. Maybe MAS can automatically try to reprovision the user in Synapse if it receives a 404 on device creation?

sandhose commented 2 days ago

This is basically due to the job queue being unreliable.

See #2625 and #1490

We've started rewriting the job queue completely to make it reliable and properly retry lost/errored jobs, which should solve this issue

You should subscribe to #2785 to check with the progress