element-hq / synapse

Synapse: Matrix homeserver written in Python/Twisted.
https://element-hq.github.io/synapse
GNU Affero General Public License v3.0
989 stars 112 forks source link

Error attempting to resolve state at missing prev_events: 403: Forbidden #15204

Open matrixbot opened 6 months ago

matrixbot commented 6 months ago

This issue has been migrated from #15204.


Description

I use a script to test federation between matrix.org and my homeserver. It uses two bots. The test goes like this:

Bot 1 is on my HS. Bot 2 is on matrix.org.

  1. Bot 1 creates a room
  2. Bot 1 invites bot 2
  3. Bot 2 joins the room.
  4. Bot 1 sends a message to the room
  5. Using data from both bots the time it took to receive the message is calculated
  6. Both bots leave the room and forget it

This process is then repeated for bot 2.

I logged into the bots today to grab new access tokens and saw that there were a bunch of rooms that they had not left so I manually removed them. I was able to leave all the rooms of the matrix.org bot but was not able to do so for the bot on my HS. I tried nuking the rooms using /_synapse/admin/v2/rooms/ ( with force + purge) but that gave me this error:

2023-03-03 16:56:33,209 - synapse.storage.util.partial_state_events_tracker - 82 - INFO - shutdown_and_purge_room-0 - Awaiting un-partial-stating of events ['$goL0dWWdYWwvAeMR2yIepZPXCxlF9eGkF2Lx6P04UXk']
Stack (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/app/homeserver.py", line 389, in <module>
    main()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/app/homeserver.py", line 385, in main
    run(hs)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/app/homeserver.py", line 364, in run
    _base.start_reactor(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/app/_base.py", line 191, in start_reactor
    run()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/app/_base.py", line 173, in run
    run_command()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/app/_base.py", line 148, in <lambda>
    run_command: Callable[[], None] = lambda: reactor.run(),
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/base.py", line 1318, in run
    self.mainLoop()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/base.py", line 1328, in mainLoop
    reactorBaseSelf.runUntilCurrent()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/base.py", line 967, in runUntilCurrent
    f(*a, **kw)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 696, in callback
    self._startRunCallbacks(result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 798, in _startRunCallbacks
    self._runCallbacks()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1792, in gotResult
    _inlineCallbacks(r, gen, status, context)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1782, in _inlineCallbacks
    status.deferred.callback(callbackValue)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 696, in callback
    self._startRunCallbacks(result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 798, in _startRunCallbacks
    self._runCallbacks()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 696, in callback
    self._startRunCallbacks(result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 798, in _startRunCallbacks
    self._runCallbacks()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1792, in gotResult
    _inlineCallbacks(r, gen, status, context)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1782, in _inlineCallbacks
    status.deferred.callback(callbackValue)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 696, in callback
    self._startRunCallbacks(result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 798, in _startRunCallbacks
    self._runCallbacks()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1371, in _cbDeferred
    self.callback(cast(_DeferredListResultListT, self.resultList))
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 696, in callback
    self._startRunCallbacks(result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 798, in _startRunCallbacks
    self._runCallbacks()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 696, in callback
    self._startRunCallbacks(result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 798, in _startRunCallbacks
    self._runCallbacks()
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1792, in gotResult
    _inlineCallbacks(r, gen, status, context)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/metrics/background_process_metrics.py", line 242, in run
    return await func(*args, **kwargs)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/handlers/pagination.py", line 659, in _shutdown_and_purge_room
    ].shutdown_room = await self._room_shutdown_handler.shutdown_room(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/handlers/room.py", line 1869, in shutdown_room
    _, stream_id = await self.room_member_handler.update_membership(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/handlers/room_member.py", line 597, in update_membership
    result = await self.update_membership_locked(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/handlers/room_member.py", line 826, in update_membership_locked
    state_before_join = await self.state_handler.compute_state_after_events(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/state/__init__.py", line 219, in compute_state_after_events
    ret = await self.resolve_state_groups_for_events(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/util/metrics.py", line 113, in measured_func
    r = await func(self, *args, **kwargs)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/state/__init__.py", line 520, in resolve_state_groups_for_events
    state_groups = await self._state_storage_controller.get_state_group_for_events(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/storage/controllers/state.py", line 374, in get_state_group_for_events
    await self._partial_state_events_tracker.await_full_state(event_ids)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/storage/util/partial_state_events_tracker.py", line 82, in await_full_state
    logger.info(

I restarted the server and checked the logs and saw that it was failing to re-sync the broken rooms with errors such as:

2023-03-03 18:05:22,514 - synapse.handlers.federation_event - 1073 - WARNING - sync_partial_state_room-9-$utvlr8R3FKFrtoRlZxyTAtAT6ybEQGW0o6kar9N_aNY - Error attempting to resolve state at missing prev_events: 403: Forbidden
2023-03-03 18:05:22,515 - synapse.handlers.federation - 1924 - ERROR - sync_partial_state_room-9 - Failed to get state for !HsKRUpcYhnAgwvbSgH:matrix.org at <FrozenEventV3 event_id=$utvlr8R3FKFrtoRlZxyTAtAT6ybEQGW0o6kar9N_aNY, type=m.room.member, state_key=@evulid-cc-federation-tester-on-evulid-cc:evulid.cc, outlier=False> from matrix.org because ERROR 403: We can't get valid state history., giving up!
2023-03-03 18:05:22,515 - synapse.metrics.background_process_metrics - 244 - ERROR - sync_partial_state_room-4 - Background process 'sync_partial_state_room' threw an exception
Traceback (most recent call last):
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/handlers/federation_event.py", line 1056, in _compute_event_context_with_maybe_missing_prevs
    await self._get_state_ids_after_missing_prev_event(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/handlers/federation_event.py", line 1119, in _get_state_ids_after_missing_prev_event
    ) = await self._federation_client.get_room_state_ids(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/logging/opentracing.py", line 901, in _wrapper
    return await func(*args, **kwargs)  # type: ignore[misc]
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/federation/federation_client.py", line 505, in get_room_state_ids
    result = await self.transport_layer.get_room_state_ids(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/federation/transport/client.py", line 78, in get_room_state_ids
    return await self.client.get_json(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/http/matrixfederationclient.py", line 1077, in get_json
    response = await self._send_request_with_optional_trailing_slash(
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/http/matrixfederationclient.py", line 406, in _send_request_with_optional_trailing_slash
    response = await self._send_request(request, **send_request_args)
  File "/opt/matrix-synapse/lib/python3.10/site-packages/synapse/http/matrixfederationclient.py", line 668, in _send_request
    raise exc
synapse.api.errors.HttpResponseException: 403: Forbidden

Doing /_synapse/admin/v1/purge_history/ on the room yields this:

{"status":"failed","error":"update or delete on table \"events\" violates foreign key constraint \"partial_state_events_event_id_fkey\" on table \"partial_state_events\"\nDETAIL:  Key (event_id)=($U259e4xDM9mlnGzxtK3kGpToqUX8QypNQQ5Tj8lcIwI) is still referenced from table \"partial_state_events\".\n"}

I've thought about nuking everything (https://github.com/matrix-org/synapse/issues/14300#issuecomment-1294969068) but thought I'd file a bug report first.

Steps to reproduce

???

Homeserver

evulid.cc + matrix.org

Synapse Version

1.77.0

Installation Method

pip (from PyPI)

Database

PostgreSQL, no

Workers

Multiple workers

Platform

Ubuntu 22.04 LXC

Configuration

Shorter msg retention

Relevant log output

See above.

Anything else that would be useful to know?

No response

YourSandwich commented 4 months ago

Same issue here, just joining a bigger room is enough. And synapse dies.

That a very critical issue, literally my whole synapse server dies.

YourSandwich commented 4 months ago

Before v1.100 I think everything was ok

YourSandwich commented 4 months ago
2024-03-08 23:29:38,668 - synapse.handlers.federation - 1989 - ERROR - sync_partial_state_room-1 - Failed to get state for !mefQhZzgTaxNCNzAeK:kde.org at <FrozenEventV3 event_id=$-m9oVE3FrfmcYhf7-30ZP2b-xMTINeXJDcZ2BdpELi8, type=m.room.member, state_key=@sandwich:archworks.co, outlier=False> from alteredcube.com because ERROR 403: We can't get valid state history., giving up!
2024-03-08 23:29:38,672 - synapse.metrics.background_process_metrics - 253 - ERROR - sync_partial_state_room-1 - Background process 'sync_partial_state_room' threw an exception
Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/synapse/handlers/federation_event.py", line 1165, in _compute_event_context_with_maybe_missing_prevs
    await self._get_state_ids_after_missing_prev_event(
  File "/usr/lib/python3.11/site-packages/synapse/handlers/federation_event.py", line 1228, in _get_state_ids_after_missing_prev_event
    ) = await self._federation_client.get_room_state_ids(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/federation/federation_client.py", line 560, in get_room_state_ids
    result = await self.transport_layer.get_room_state_ids(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/federation/transport/client.py", line 87, in get_room_state_ids
    return await self.client.get_json(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/http/matrixfederationclient.py", line 1215, in get_json
    json_dict, _ = await self.get_json_with_headers(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/http/matrixfederationclient.py", line 1317, in get_json_with_headers
    response = await self._send_request_with_optional_trailing_slash(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/http/matrixfederationclient.py", line 498, in _send_request_with_optional_trailing_slash
    response = await self._send_request(request, **send_request_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/http/matrixfederationclient.py", line 794, in _send_request
    raise exc
synapse.api.errors.HttpResponseException: 403: Forbidden

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/synapse/metrics/background_process_metrics.py", line 251, in run
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/handlers/federation.py", line 1840, in _sync_partial_state_room_wrapper
    await self._sync_partial_state_room(
  File "/usr/lib/python3.11/site-packages/synapse/handlers/federation.py", line 1966, in _sync_partial_state_room
    await self._federation_event_handler.update_state_for_partial_state_event(
  File "/usr/lib/python3.11/site-packages/synapse/handlers/federation_event.py", line 635, in update_state_for_partial_state_event
    context = await self._compute_event_context_with_maybe_missing_prevs(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/synapse/handlers/federation_event.py", line 1185, in _compute_event_context_with_maybe_missing_prevs
    raise FederationError(
synapse.api.errors.FederationError: ERROR 403: We can't get valid state history.
YourSandwich commented 4 months ago

With v1.99 synapse does not crash, but throws the same error as v1.103