Open matrixbot opened 10 months ago
@DMRobertson
I haven't had any issues which I described as "ghost notifications" for quite some time now. So I think that was fixed somehow.
The situation with the "Event duplication" relaxed a lot for a long time, but increased noticeably in the past couple of weeks. The events that were "duplicated" all seem to come from a particular time span, as before, where we experienced a distributed join spam attack with >35k users.
On Dec 16 2023 we had an interesting "conversation" with our bot (mjolnir). We told it to ban a user, just as usual, and it told us that it isn't in the room anymore:
Dec 16 19:02:16 matrix matrix-bot-mjolnir[189389]: Sat, 16 Dec 2023 19:02:16 GMT [ERROR] [MatrixHttpClient] (REQ-46674) [Error: Error during MatrixClient request GET /_matrix/client/v3/rooms/!iuyQXswfjgxQMZGrfQ%3Amatrix.org/state/m.room.server_acl/: 404 Not Found -- {"errcode":"M_NOT_FOUND","error":"Event not found."}]
Dec 16 19:02:16 matrix matrix-bot-mjolnir[189389]: Sat, 16 Dec 2023 19:02:16 GMT [ERROR] [MatrixHttpClient] (REQ-46676) [Error: Error during MatrixClient request PUT /_matrix/client/v3/rooms/!iuyQXswfjgxQMZGrfQ%3Amatrix.org/state/m.room.server_acl/: 403 Forbidden -- {"errcode":"M_FORBIDDEN","error":"User @bot.mjolnir:michaelsasser.org not in room !iuyQXswfjgxQMZGrfQ:matrix.org"}]
So I first checked the room member list in element-web (with sliding-sync) on my instance, and it was still there, but it had the avatar and display name from over a year ago. On the matrix.org
instance using the app.vector.im
element client, it wasn't in the room list anymore. When I cleared my instance's client, it was gone too. When I checked the developer tools (on my instance) under room state -> m.room.member
, the bot wasn't shown at all, as if it would have never been joined to the room. So I checked the logs where nothing indicated
anything interesting. Then I went to the database and found something disturbing. The current membership state event of the bot is not the one it should be. It was actually an historical one from the time span mentioned above.
The last join the bot logged happened at 2023-12-07 11:02 and the next join took place at 2023-12-07 11:34:00 (which was not logged anymore by the bot). So, whatever went sideways happened, with high certainty between 2023-12-07 11:02:00
and 2023-12-07 11:34:00
(UTC+1).
-- Query the current membership state event for the bot
SELECT json::JSONB, event_id
FROM event_json
WHERE event_id IN
(SELECT event_id
FROM current_state_events
WHERE room_id='!iuyQXswfjgxQMZGrfQ:matrix.org'
AND TYPE='m.room.member'
AND state_key='@bot.mjolnir:michaelsasser.org'
);
// The response to that query was an old state event which should not be the
// current one. Though, from the origin_server_ts it seems to be consistent
// with the old avatar and displayname I initially saw in the room user
// directory on element-web which sliding-sync enabled.
// $cb_QySiCLd5FLanxgrgnYzPLG83U2zO69PHwGj4Cxu8
{
type: "m.room.member",
depth: 45711,
hashes: {
sha256: "Y4IQHedNuOc1PQ/bZAEUnZw3asz/dcUr8vINr3hhPN0",
},
origin: "michaelsasser.org",
sender: "@bot.mjolnir:michaelsasser.org",
content: {
membership: "leave",
},
room_id: "!iuyQXswfjgxQMZGrfQ:matrix.org", // Python Room: @python:matrix.org
unsigned: {
replaces_state: "$qU1rX95f1WpJKa6oHtwAPIyp0dmVbvP6DqDQ7gQtEI8",
},
state_key: "@bot.mjolnir:michaelsasser.org",
prev_state: [],
signatures: {
"michaelsasser.org": {
"ed25519:a_QNjX": "L3HB/fwK4lpQRMcBcXSJQkc9PnY1zxucjJ8w25jAwLwHINT1HWPt4d1IHOg7ldIg3/+MbW/L8ax+RRRg0aYmCA",
},
},
auth_events: [
"$qU1rX95f1WpJKa6oHtwAPIyp0dmVbvP6DqDQ7gQtEI8",
"$Pq-FciS-IXzHqwLHHu6EZG9u7Y_hDes8nD3p4VwzfJo",
"$ZW_yTzbdiRuFrpN5SNhnHCYSmzObqCsP0ADKGF8bsz4",
],
prev_events: ["$6DeuJJjdF7lkemjqQOxCJavCfg0_-cixJ_vYyTaShrw"],
origin_server_ts: 1643233431682, // Wednesday, January 26, 2022 9:43:51.682 PM
}
Normally, it should be or more precisely, it should have responded to with this state event:
-- Query the latest (by time) membership state event for the bot
SELECT json::JSONB, event_id
FROM event_json
WHERE event_id IN
(SELECT event_id
FROM state_events
WHERE room_id='!iuyQXswfjgxQMZGrfQ:matrix.org'
AND TYPE='m.room.member'
AND state_key='@bot.mjolnir:michaelsasser.org'
)
ORDER BY json::JSONB->'origin_server_ts' DESC
LIMIT 1;
// $_6N5ltJ9hg35sk4j8Do7PTNNzlMjcZSk7p8qlJ51fFA
{
type: "m.room.member",
depth: 118579,
hashes: {
sha256: "YZ/qfnBF+RlFUO8vaXpPep6UVcAgc9sVbue8oUDVaAU",
},
origin: "michaelsasser.org",
sender: "@bot.mjolnir:michaelsasser.org",
content: {
avatar_url: "mxc://michaelsasser.org/kPMJgudCvuUzCIrOeJuUrPTn",
membership: "join",
displayname: "Administrator",
},
room_id: "!iuyQXswfjgxQMZGrfQ:matrix.org",
unsigned: {
age_ts: 1669213202762,
replaces_state: "$rST-rCiCfl8z0zZVSlBWZEeuVX8oy40SFOkrlNoJ5RY",
},
state_key: "@bot.mjolnir:michaelsasser.org",
signatures: {
"michaelsasser.org": {
"ed25519:a_QNjX": "gfeD/i2blF1OSX7A3D59lGsDMBeQx0zdfmmP6ZK7D/VKcwjI1tTm9zDLS+//Nq07qKUsGx9KJWF9WuhUTdg7CQ",
},
},
auth_events: [
"$rST-rCiCfl8z0zZVSlBWZEeuVX8oy40SFOkrlNoJ5RY",
"$Ur-nGignA9Bqyyv-YRC5oH8UkGHzqgoqevuzM_m8Jn8",
"$Pq-FciS-IXzHqwLHHu6EZG9u7Y_hDes8nD3p4VwzfJo",
"$tAy72IFE7UsohvKVrhH8jbolxKq0N0qM1P7cClN9xIE",
],
prev_events: ["$_rx-nDqmJuZW5Tu5qRP3xdLbuFSm3wby970RG3JP1wU"],
origin_server_ts: 1669213202762, // Wednesday, November 23, 2022 2:20:02.762 PM
}
Just for reference:
//edit 2024-01-09
I see the same thing happening now with power levels m.room.power_levels
i.e. moderators are suddenly no longer moderators and so on.
NAME CORES CPU TYPE ARCHITECTURE MEMORY DISK STORAGE TYPE TRAFFIC
cx31 2 shared x86 8.0 GB 80 GB local 20 TB
The matrix parts were deployed using the Ansible playbook: spantaleev/matrix-docker-ansible-deploy
Linux matrix 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29) x86_64GNU/Linux
Docker version 24.0.7, build afdd53b
containerd containerd.io 1.6.24 61f9fd88f79f081d64d6fa3bb1a0dc71ec870523
ghcr.io/matrix-org/synapse:v1.98.0
ghcr.io/matrix-org/sliding-sync:v0.99.13
vectorim/element-web:v1.11.51
postgres:15.5-alpine
Thanks for the update and your thorough investigation. I'm no longer working on Matrix, but I'm sure the Synapse team will take a look at this as part of issue review.
Since the last edit (2024-01-09), where power levels reverted suddenly, we have seen something like this happening a few times, Topics revert to a previous state, Users are suddenly no longer joined to the room, Homeservers who have joined the room before and have not purged the room from the homeserver can no longer join the room (we asked for logs, or more information but none of them were able to provide them).
This issue has been migrated from #14539.
Description
Ghost notifications
Since synapse v1.70 or v1.71, I've noticed randomly occurring "ghost notification", happening on element-web, element-desktop and element-android, hydrogen. They usually come in small, fast bursts of two to four notifications (I hear an audible sound telling me about new events in a room). When I check for new events, there aren't any. Nothing has changed. I have the options
Developer mode
andShow hidden events in timeline
enabled in element, so I would assume, I'd see any changes to a room.Event duplication
Around the same Time, I've noticed some events, some from weeks or months ago, from being randomly duplicated and pushed as new ones. This happens independently of the "ghost notification issue". At first, I've only noticed this in one particular room, where the same messages were duplicated a few times over a couple of days (Example: see "Image 1" below), possibly leading to an unstable room (see "Possible implications). Today, the same thing happened in our mjolnir control room, where mjolnir (configured as a client and not as an appservice) executed the duplicated command (Example: see "Image 2" below). Actually, I'm actually not certain, if my homeserver is the one, the issues originate from, as the duplicated events are visible on other instances too, including
matrix.org
and maybe others as well.Image 1: Duplicated messages in the python room
Image 2: Mjolnir bot reacts to duplicated events
The "failed to leave" error from mjolnir is unrelated (and already fixed). The duplication errors happened before the room, it was trying to leave, had issues, too, and might have led to the issue, where the room was becoming unstable on my homeserver.
Possible implications
After this happened a few times, in the Python room
#python:matrix.org
, the room became unstable on my end, and I was unable to send events to that room any more because of missing state events. I also wasn't able to remove the room with the admin API, so, I manually had to remove the room from the database to gain access again:(Because I had to roll back a few times, while I was trying to figure out, how I can remove the room, I accidentally deleted the logs as well.)
Steps to reproduce
Homeserver
matrix.org; michaelsasser.org
Synapse Version
matrixdotorg/synapse; tags: v1.71.0, v1.72.0, possibly v1.70.0
Installation Method
Docker (matrixdotorg/synapse)
Database
postgres:15.0-alpine on same machine, It was recently restored to upgrade to v15, not sure, if it happeded before; recently restored while trying to removing room, issues happened before
Workers
Multiple workers
Platform
Environment: CX31 Hetzner Cloud Using the Ansible playbook: spantaleev/matrix-docker-ansible-deploy Debian: Linux matrix 5.10.0-19-amd64 vector-im/element-web#1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux Docker: Docker version 20.10.21, build baeda1f Containerd: containerd containerd.io 1.6.10 770bd0108c32f3fb5c73ae1264f7e503fe7b2661
Configuration
Playbook
Using the Ansible playbook: [spantaleev/matrix-docker-ansible-deploy], contains multiple files (result below):
Playbook/config
```yaml --- # hookshot matrix_hookshot_enabled: true # default matrix_hookshot_gitlab_enabled: true matrix_hookshot_permissions: # Allow all users to send commands to existing services - actor: "*" services: - service: gitlab level: commands # Allow users on this domain to log in to gitlab. - actor: "!foo:michaelsasser.org" services: - service: gitlab level: login # Allow this specific user to do any action - actor: "@foo:michaelsasser.org" services: - service: "*" level: admin --- # Mjolnir matrix_bot_mjolnir_enabled: true matrix_bot_mjolnir_access_token: "foo" matrix_bot_mjolnir_management_room: "!foo:michaelsasser.org" matrix_bot_mjolnir_configuration_extension_yaml: | backgroundDelayMS: 50 fasterMembershipChecks: true recordIgnoredInvites: true commands: additionalPrefixes: - mjolnir - bot automaticallyRedactForReasons: - spam - advertising - advertisement - ad - scam protections: wordlist: words: - foo - more foo minutesBeforeTrusting: 1234 # 1234 is not the actual time --- # ssl matrix_ssl_lets_encrypt_support_email: foo@MichaelSasser.org --- # element # set integration manager matrix_client_element_integrations_ui_url: "https://scalar.vector.im/" matrix_client_element_integrations_rest_url: "https://scalar.vector.im/api" matrix_client_element_integrations_widgets_urls: ["https://scalar.vector.im/api"] # set server room directory matrix_client_element_roomdir_servers: - michaelsasser.org - matrix.org # Extra themes matrix_client_element_themes_enabled: false # Default Template matrix_client_element_default_theme: "dark" matrix_client_element_configuration_extension_json: | { "sso_redirect_options": { "immediate": true } } --- # hydrogen # matrix_client_hydrogen_enabled: true --- matrix_grafana_enabled: true matrix_grafana_anonymous_access: false matrix_grafana_default_admin_user: foo # Do not change matrix_grafana_default_admin_password: foo matrix_grafana_container_extra_arguments: - -e GF_AUTH_GENERIC_OAUTH_NAME="Single Sign-On" - -e GF_AUTH_GENERIC_OAUTH_ENABLED="true" - -e GF_AUTH_GENERIC_OAUTH_CLIENT_ID="matrix-grafana" - -e GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET="foo" - -e GF_AUTH_GENERIC_OAUTH_SCOPES="openid" - -e GF_AUTH_GENERIC_OAUTH_AUTH_URL="https://foo.michaelsasser.org/realms/MichaelSasser/protocol/openid-connect/auth" - -e GF_AUTH_GENERIC_OAUTH_TOKEN_URL="https://foo.michaelsasser.org/realms/MichaelSasser/protocol/openid-connect/token" - -e GF_AUTH_GENERIC_OAUTH_API_URL="https://foo.michaelsasser.org/realms/MichaelSasser/protocol/openid-connect/userinfo" - -e GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP="true" - -e GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH="contains(roles[*], 'Admin') && 'Admin' || contains(roles[*], 'Editor') && 'Editor' || contains(roles[*], 'Viewer') && 'Viewer'|| ''" - -e GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_STRICT="true" # Disable Login Form - -e GF_AUTH_DISABLE_LOGIN_FORM="true" - -e GF_AUTH_OAUTH_AUTO_LOGIN="true" - -e GF_AUTH_BASIC_ENABLED="false" --- # email matrix_mailer_sender_address: "foo@MichaelSasser.org" matrix_mailer_relay_use: true matrix_mailer_relay_host_name: "foo.michaelsasser.org" matrix_mailer_relay_host_port: 587 matrix_mailer_relay_auth: true matrix_mailer_relay_auth_username: "foo@michaelsasser.org" matrix_mailer_relay_auth_password: "foo" --- matrix_synapse_rust_synapse_compress_state_find_rooms_command_wait_time: 1200 matrix_synapse_rust_synapse_compress_state_compress_room_time: 68400 matrix_synapse_rust_synapse_compress_state_psql_import_time: 68400 # Base Domain Serving matrix_nginx_proxy_base_domain_serving_enabled: true matrix_nginx_proxy_connect_timeout: 120 matrix_nginx_proxy_send_timeout: 120 matrix_nginx_proxy_read_timeout: 120 matrix_nginx_send_timeout: 120 # Fix issue where the admin api is not reacheable anymore matrix_nginx_proxy_proxy_matrix_client_redirect_root_uri_to_domain: "" matrix_nginx_proxy_proxy_matrix_client_api_forwarded_location_synapse_admin_api_enabled: true --- matrix_postgres_connection_password: "foo" matrix_postgres_container_postgres_bind_port: 5432 matrix_postgres_process_extra_arguments: [ "-c max_connections=120", "-c shared_buffers=512MB", "-c effective_cache_size=1536MB", "-c maintenance_work_mem=128MB", "-c checkpoint_completion_target=0.9", "-c wal_buffers=16MB", "-c default_statistics_target=100", "-c random_page_cost=1.1", "-c effective_io_concurrency=200", "-c work_mem=2184kB", "-c min_wal_size=1GB", "-c max_wal_size=4GB", "-c max_worker_processes=2", "-c max_parallel_workers_per_gather=1", "-c max_parallel_workers=2", "-c max_parallel_maintenance_workers=1" ] # Postgres Backups matrix_postgres_backup_enabled: true matrix_postgres_backup_keep_months: 12 --- # grafana prometheus matrix_prometheus_enabled: true matrix_prometheus_node_exporter_enabled: true matrix_prometheus_postgres_exporter_enabled: false matrix_prometheus_postgres_exporter_database_username: "matrix_prometheus_postgres_exporter" matrix_prometheus_postgres_exporter_database_password: "foo" matrix_prometheus_node_exporter_process_extra_arguments: - "--collector.disable-defaults" - "--collector.cpu" - "--collector.filesystem" - "--collector.os" - "--collector.stat" - "--collector.meminfo" - "--collector.loadavg" - "--collector.time" - "--collector.uname" # Mandatory, otherwide the data vanishes from grafana --- # synapse # Federation matrix_synapse_federation_enabled: true matrix_synapse_allow_public_rooms_over_federation: true # Controls whether accessing the server's public rooms directory can be done # without authentication. # For private servers, you most likely wish to require authentication, # unless you know what list of rooms you're publishing to the world and # explicitly want to do it. matrix_synapse_allow_public_rooms_without_auth: true # Matrix cache size (ram) default: 0.5 matrix_synapse_caches_global_factor: 1.5 # 0.5, is default matrix_synapse_event_cache_size: "100K" # OpenID (Keycloak) (rest see below in matrix_synapse_configuration_extension_yaml) matrix_nginx_proxy_proxy_matrix_client_api_forwarded_location_synapse_oidc_api_enabled: true # Extra Synapse homeserver.yaml config matrix_synapse_configuration_extension_yaml: | suppress_key_server_warning: true # defaults to matrix.org #################### # room complexity #################### limit_remote_rooms: enabled: true complexity: 0.5 complexity_error: "This room is too complex." admins_can_join: true #################### # blacklist #################### ip_range_blacklist: - "123.123.123.123/32" # foo.xyz - "123.123.123.123/32" # matrix.foo.com - "123.123.123.123/32" # matrix.foo.sh - "123.123.123.123/32" # matrix.foo.net - "123.123.123.123/32" # matrix.foo.biz (CNAME) #################### # retention #################### redaction_retention_period: 30d retention: enabled: true default_policy: min_lifetime: 4w max_lifetime: 8w allowed_lifetime_min: 1d allowed_lifetime_max: 24w purge_jobs: - longest_max_lifetime: 3d interval: 12h - shortest_max_lifetime: 3d interval: 1d #################### # server notice #################### server_notices: system_mxid_localpart: server system_mxid_display_name: "Server Notices" system_mxid_avatar_url: "mxc://{{ matrix_domain }}/foo" room_name: "Server Notices" #################### # OIDC #################### oidc_providers: - idp_id: keycloak idp_name: "Single Sign-On" issuer: "https://foo.michaelsasser.org/realms/MichaelSasser" client_id: "synapse" client_secret: "foo" scopes: ["openid", "profile"] allow_existing_users: true user_mapping_provider: config: localpart_template: "{% raw %}{{ user.preferred_username }}{% endraw %}" display_name_template: "{% raw %}{{ user.name }}{% endraw %}" # Media retention matrix_synapse_media_retention_local_media_lifetime: 8w matrix_synapse_media_retention_remote_media_lifetime: 1w # workers matrix_synapse_workers_enabled: true matrix_synapse_workers_preset: michaelsasser_org # https://github.com/matrix-org/synapse/blob/develop/docs/workers.md matrix_synapse_workers_presets: michaelsasser_org: generic_workers_count: 1 pusher_workers_count: 1 federation_sender_workers_count: 1 media_repository_workers_count: 1 appservice_workers_count: 1 user_dir_workers_count: 1 background_workers_count: 1 stream_writer_events_stream_workers_count: 0 stream_writer_typing_stream_workers_count: 0 stream_writer_to_device_stream_workers_count: 0 stream_writer_account_data_stream_workers_count: 0 stream_writer_receipts_stream_workers_count: 0 stream_writer_presence_stream_workers_count: 0 # A secret used to protect access keys issued by the server. # You can put any string here, but generating a strong one is preferred # (e.g. `pwgen -s 64 1`). matrix_synapse_macaroon_secret_key: "foo" matrix_homeserver_generic_secret_key: "{{ matrix_synapse_macaroon_secret_key }}" matrix_synapse_max_upload_size_mb: 256 matrix_synapse_auto_join_rooms: - "#support:{{ matrix_domain }}" # matrix_synapse_use_presence: true # Disable on high load # Shared Secret Provider matrix_synapse_ext_password_provider_shared_secret_auth_enabled: false # disabled to make sure no m.password authentication is available matrix_synapse_ext_password_provider_shared_secret_auth_m_login_password_support_enabled: false # disabled to make sure no m.password authentication is available matrix_synapse_ext_password_provider_shared_secret_auth_shared_secret: "foo" --- devture_timesync_ntpd_package: systemd-timesyncd devture_timesync_ntpd_service: systemd-timesyncd # A shared secret (between Coturn and Synapse) used for authentication. # You can put any string here, but generating a strong one is preferred # (e.g. `pwgen -s 64 1`). matrix_coturn_turn_static_auth_secret: "foo" --- # Enable location sharing matrix_well_known_matrix_client_configuration_extension_json: |- { "org.matrix.msc3488.tile_server": { "map_style_url": "https://api.maptiler.com/maps/streets/style.json?key=foo" }, "m.tile_server": { "map_style_url": "https://api.maptiler.com/maps/streets/style.json?key=foo" }, "setting_defaults": { "MessageComposerInput.showStickersButton": false }, "features": { "feature_latex_maths": true, "feature_pinning": true, "feature_thread": true, "feature_html_topic": true }, "admins": [{ "matrix_id": "@foo:michaelsasser.org", "email_address": "foo@michaelsasser.org", "role": "admin" }, { "email_address": "foo@michaelsasser.org", "role": "security" }] } ``` Results in: ## homeserver.yaml ```yaml account_threepid_delegates: msisdn: '' alias_creation_rules: - action: allow alias: '*' room_id: '*' user_id: '*' allow_guest_access: false allow_public_rooms_over_federation: true allow_public_rooms_without_auth: true allowed_avatar_mimetypes: - image/png - image/jpeg - image/gif app_service_config_files: - /hookshot-registration.yml auto_join_rooms: - '#foo:michaelsasser.org' autocreate_auto_join_rooms: true background_updates: null caches: global_factor: 1.5 per_cache_factors: null cas_config: null daemonize: false database: args: cp_max: 10 cp_min: 5 database: synapse host: matrix-postgres password: foo port: 5432 user: synapse name: psycopg2 txn_limit: 0 default_room_version: '9' email: app_name: Matrix client_base_url: https://element.michaelsasser.org enable_notifs: true invite_client_location: https://app.element.io notif_for_new_users: true notif_from: MatrixAnything else that would be useful to know?
Full log during the time the duplication, see "Image 2" happened: message_duplication.log
All times & dates are UTC+1.