element-hq / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://element-hq.github.io/dendrite/
GNU Affero General Public License v3.0
33 stars 5 forks source link

dendrite server suddenly goes completely silent #2524

Open matrixbot opened 3 weeks ago

matrixbot commented 3 weeks ago

This issue was originally created by @grisu48 at https://github.com/matrix-org/dendrite/issues/2524.

Background information

Description

Since a server reboot this morning I cannot send messages anymore via Element:

image

The server was running fine for the last weeks without any noticeable issues.

In the dendrite log I find error messages like the following:

time="2022-06-09T07:34:51.751232393Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=wV81oVBTTPge req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/$local.c2f3d6f8-eb95-42b3-9ff3-38d8e179e2a9" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:35:15.652729793Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=3Z5ZPJdx0DkE req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:36:20.461579016Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=ZRaKTs4mqVqg req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:36:41.052252332Z" level=info msg="Executing UpdateUserDailyVisits"
time="2022-06-09T07:37:28.839353423Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=AbPuX72a3rSu req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"
time="2022-06-09T07:38:45.475754429Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=u5zAstAfeuKd req.method=PUT req.path="/_matrix/client/r0/rooms/!d8rmAZMAod6n7EsZ:m.feldspaten.org/send/m.room.encrypted/m1654759992909.1" user_id="@phoenix:m.feldspaten.org"

Steps to reproduce

I'm also attaching the server log after the reboot here: log.txt

matrixbot commented 3 weeks ago

This comment was originally posted by @grisu48 at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1157350557.

Anyone an idea? My homeserver is practically toast since an automated reboot last Thursday. I can't send any messages even to internal channels and federated channels remain quiet since back then. I still see the above stated error messages with level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" in the logs.

Logging out and logging in didn't helped, nor did a reboot, nor did an update to the latest container version nor did a re-creation of the whole container (using the old data files)

The instance was working fine for at least a month and stopped working without me touching anything. The reboot happened as a part of the automated update procedure and does not alter the state of the container or it's configuration files.

To me, this issue appeared completely out of the blue and I am not aware of any changes that I did what could have triggered this.

matrixbot commented 3 weeks ago

This comment was originally posted by @neilalexander at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1169794872.

context canceled normally means that the client gave up waiting for a response from the server so it closed the connection, and instead of continuing to do work for a client that's given up, we stop processing too. Normally it signals that whatever work we were doing wasn't finished.

Is this happening in just specific rooms or is it happening in all of them?

matrixbot commented 3 weeks ago

This comment was originally posted by @grisu48 at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1170869697.

This happens to the whole server, in fact the whole server is completely silent since I posted this bug. Affected rooms are

The latter also applies to "chatty" rooms where I know that traffic is ongoing. Everything went completely silent.

matrixbot commented 3 weeks ago

This comment was originally posted by @grisu48 at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1170871591.

Following the input of @neilalexander I updated the title of this issue.

matrixbot commented 3 weeks ago

This comment was originally posted by @pcmid at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1179741311.

The same issue after upgrading to v0.8.9 some days.

matrixbot commented 3 weeks ago

This comment was originally posted by @edocod1 at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1181417802.

Could this be the same issue as #2566 ?

matrixbot commented 3 weeks ago

This comment was originally posted by @TheBinaryLoop at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1257176601.

I have the same issue. Has anybody had any luck with resolving this. My homeserver is useless in this state.

matrixbot commented 3 weeks ago

This comment was originally posted by @neilalexander at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1258265227.

Have you evaluated whether your PostgreSQL connection counts are correct? Your Dendrite config must not try to use more connections than PostgreSQL is configured to allow or you'll run into problems like this.

@edocod1 @TheBinaryLoop To be clear are you seeing context canceled specifically, or a different error?

matrixbot commented 3 weeks ago

This comment was originally posted by @TheBinaryLoop at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1258323584.

Yes. I see context canceled


From: Neil Alexander @.> Sent: Monday, September 26, 2022 5:57:04 PM To: matrix-org/dendrite @.> Cc: Lukas Eßmann @.>; Mention @.> Subject: Re: [matrix-org/dendrite] dendrite server suddenly goes completely silent (Issue #2524)

Have you evaluated whether your PostgreSQL connection counts are correct? Your Dendrite config must not try to use more connections than PostgreSQL is configured to allow or you'll run into problems like this.

@edocod1https://github.com/edocod1 @TheBinaryLoophttps://github.com/TheBinaryLoop To be clear are you seeing context canceled specifically, or a different error?

— Reply to this email directly, view it on GitHubhttps://github.com/matrix-org/dendrite/issues/2524#issuecomment-1258265227, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADGS77FMYOB6Z5SDVOGUJWTWAHBVBANCNFSM5YJCIH6Q. You are receiving this because you were mentioned.Message ID: @.***>

matrixbot commented 3 weeks ago

This comment was originally posted by @mat-1 at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1312646503.

I had this issue and fixed it by deleting the jetstream directory.

matrixbot commented 3 weeks ago

This comment was originally posted by @vpzomtrrfrt at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1843896427.

Deleting jetstream did not resolve the issue for me, it still appears randomly

Since there might be multiple issues here, in my case it's:

matrixbot commented 3 weeks ago

This comment was originally posted by @vpzomtrrfrt at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1872565381.

my current workaround is a wrapper program that terminates dendrite if it prints /level=error.*context canceled/

matrixbot commented 3 weeks ago

This comment was originally posted by @troed at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1942311726.

I just had "this" happen to me after an ISP external IP number change. Nothing I did made any difference, but searching for the "context canceled" brought me here. Since restarting (I use the monolith docker image) made no difference I stopped and removed the image, deleted the jetstream volume and then pulled a fresh image. That finally got the server running again.

Dendrite version 0.13.6+87f028d

(was at 0.13.5 when the IP number initially changed, upgrade as one of the things I tried to get things running again)

Typical log error, and I know they don't say much. That was all there was besides the normal stuff.

time="2024-02-13T17:59:07.748363621Z" level=error msg="SendEvents failed" error="InputRoomEventsResponse: context canceled" req.id=************ req.method=PUT req.path="/_matrix/client/r0/rooms/!hooh0PDGQ1jLlf5u:************/send/m.room.encrypted/kMXEventLocalId_************************" user_id="@*************"

Unfortunately it seems deleting jetstream might have caused secondary issues now that'll I try to fix manually now.

time="2024-02-13T19:47:34.506466310Z" level=warning msg="failed to get state after $************* locally" context=missing error="storage: event IDs missing from the database (0 != 1)" room_id="!****************" txn_event="$****************" txn_prev_events="[*********************]"

matrixbot commented 3 weeks ago

This comment was originally posted by @niebloomj at https://github.com/matrix-org/dendrite/issues/2524#issuecomment-1953314956.

I keep having this happen to me. My server is completely busted until a rm -rf ./jetstream and restart dendrite. Then things come back online but I get a bunch of broken event ids error logs for a few hours.