Closed matrixbot closed 3 weeks ago
This comment was originally posted by @RobinJ1995 at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-753695781.
Can confirm this also happens in a polylith setup.
This comment was originally posted by @neilalexander at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-753939222.
The console logs show sync endpoints timing out.
Do you have more information on this, like how long it took for the request to time out?
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-754602545.
Do you have more information on this, like how long it took for the request to time out?
Unfortunately no :/ It felt like minutes. It has not yet happened again, will ping if it does (at least if it doesn't recover by itself, I will notice).
This comment was originally posted by @PureTryOut at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-754646326.
So seems I had the same issue, as reported in #1685. For me it timed out after 120003ms which is about 2 minutes. And that is Nginx timing out, as I get a 504 Gateway timeout result.
This comment was originally posted by @neilalexander at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-758205833.
Does this still happen on Dendrite 0.3.5?
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-759295478.
Does this still happen on Dendrite 0.3.5?
Yes, looks like this just happened sometime during the last 12 hours, just opened my browser and seeing sync time out, been on 0.3.5 since Monday or so.
Process is pretty much idling:
I've copied the logs over, will send you the packet should you want to have a look. I've not restarted the process yet, happy to give you access if you want - though I guess since it takes a while to happen it's probably not that useful?
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-759298603.
Also, just to update, now running on vanilla untweaked 0.3.5 as my appservices patch was included in that.
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-759301988.
Actually, it's not just sync, looks like a remote invite I tried to do also got timed out. From a remote Synapse:
{PUT-O-1566172} [jasonrobinson.me] Request failed: PUT matrix://jasonrobinson.me/_matrix/federation/v2/invite/%21abcdefgh%3Adomain.tld/redactedid: ResponseNeverReceived:[CancelledError()]
This comment was originally posted by @PureTryOut at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-759308065.
I must say that so far I can't reproduce the issue anymore. I'll give it a few more days though before I confirm for sure.
This comment was originally posted by @PureTryOut at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-762220174.
Actually, I can reproduce it again.
I actually think it doesn't have anything to do with the sync API persé, as federation also seems to be behind when I restart the service. The federation tester doesn't get a proper result back either.
This comment was originally posted by @kegsay at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-766770231.
Hmm, I wonder if we're leaking connections and they are all ending up in CLOSE_WAIT, starving our ability to service requests. Can you run netstat
to see how many open sockets the process has got? cc @neilalexander
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-767390591.
/etc/dendrite $ netstat -ae
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.11:44527 0.0.0.0:* LISTEN
tcp 0 0 af6e9eaaeeeb:53554 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58620 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58500 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:49276 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53540 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53550 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53552 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53576 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:56624 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:54968 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53544 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58068 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58622 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:36256 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53542 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:54442 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53346 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38586 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:54190 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58522 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:56412 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 :::8008 :::* LISTEN
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:38716 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:38380 ESTABLISHED
udp 0 0 127.0.0.11:55670 0.0.0.0:*
Doesn't look too likely? Unless I'm using the wrong command/args. The last time this happened was 4 days ago (based on when I had to last restart it). I can run this again once it happens again before restarting.
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-768131342.
@Kegsay it happened again, and this is what netstat shows now
/etc/dendrite $ netstat
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 af6e9eaaeeeb:53554 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58620 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38816 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38820 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38824 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58500 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:60636 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38028 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53540 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53550 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38822 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53552 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:36162 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53576 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:54968 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53544 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58068 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38120 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:58622 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:36256 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53542 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38814 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38818 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:54442 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38812 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38810 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:53346 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:38586 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:54190 dendrite-postgres.dendrite:postgresql ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56794 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53092 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:60258 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59114 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:60216 CLOSE_WAIT
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54104 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58034 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59364 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57358 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54586 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53002 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53156 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55928 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54868 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57906 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59336 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57852 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53018 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55644 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53412 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55158 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57458 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56462 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57068 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54696 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54774 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53500 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55400 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56760 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57512 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56000 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56514 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55496 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57554 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55944 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:60256 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58004 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59450 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55348 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59706 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56398 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54628 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57342 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53590 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54718 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53036 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59724 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54526 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56014 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54140 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55428 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55094 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58236 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53198 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56924 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:52922 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57306 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54910 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57814 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56322 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58798 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54680 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59278 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56830 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55246 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58768 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55764 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55192 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:60108 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55908 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58618 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59308 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53644 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57232 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53562 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58956 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53362 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58242 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56816 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53818 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57704 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55382 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57034 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58444 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55850 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57100 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56436 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55102 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59426 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:60236 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53232 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58890 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55978 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57926 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56536 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:60070 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55174 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59084 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58120 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54004 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54072 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54172 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53432 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59182 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56978 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57488 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58112 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54120 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56484 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54972 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:55032 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56460 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:59608 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54322 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56178 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56242 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:56402 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:54850 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:57164 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:53990 ESTABLISHED
tcp 0 0 af6e9eaaeeeb:8008 traefik.dendrite:58688 ESTABLISHED
Active UNIX domain sockets (w/o servers)
Proto RefCnt Flags Type State I-Node Path
/etc/dendrite $ netstat | wc -l
154
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-768180856.
I'm running a netstat | wc -l
in watch, it's now up to 171 172 :grin:
Edit 12 hours later: Now up to 310 before I knifed it to update to 0.3.7.
This comment was originally posted by @osmarks at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-770294640.
I ran into the same issue after only a few hours of running it, so it looks like it's not purely based on time. What logs do you need to help debug this?
This comment was originally posted by @neilalexander at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-775867944.
Can you all please confirm which architecture you are running on? amd64
, aarch64
, etc.
This comment was originally posted by @PureTryOut at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-775923218.
As mentioned on Matrix (posting it here to keep an overview), I'm running it on aarch64. Also Alpine Linux where Go is compiled with Musl rather than glibc (not sure if it matters).
This comment was originally posted by @osmarks at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-776909731.
Can you all please confirm which architecture you are running on?
amd64
,aarch64
, etc.
My instance is on amd64. I haven't had the issue since then though.
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-776993149.
Can you all please confirm which architecture you are running on?
amd64
,aarch64
, etc.
amd64
This comment was originally posted by @viasux at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-814455331.
I believe I am having the same issue. Federation tester shows no errors and the logs are very vague.
This comment was originally posted by @yaliqmadiq at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-815868138.
Let's try to find out what goroutines are actually running at the time this behavior happens. Make sure your terminal has a massive buffer (10,000+ lines if possible) because I just tried to redirect sysout to a file and it doesn't work. (Opening a new issue to add a -o option to mono/poly)
Try this... next time the server freezes up like that:
ps -A | grep dend
and get the PID of dendrite
then do this:
kill -1 {pid}
Sending a SIGHUP or a SIGINT (kill code 2) should cause a Golang program like dendrite to spit out a full dump of all the goroutines that were running at the time of the halt including the callstack in each goroutine. This behavior smells like a goroutine leak in Dendrite.
At the end of the file there will be a massive dump of all the goroutines that were running and their callstacks which gives away which goroutine spawns are accumulating.
Probably capture the end part of the file where you sent SIGHUP to kill the server and grab that goroutine dump and paste it here. It might yield a clue as to who has spawned a goroutine leak.
This comment was originally posted by @jaywink at https://github.com/matrix-org/dendrite/issues/1676#issuecomment-1093466487.
This hasn't happened to me for a while now, when previously it used to happen pretty much consistently after some time. I haven't seen this since upgrading to v0.6.4 afaict.
Closing, if anyone sees this, please ping with fresh details.
This issue was originally created by @jaywink at https://github.com/matrix-org/dendrite/issues/1676.
Background information
go version
: official imagesDescription
My Dendrite server has become unresponsive two times in a 24 hour period. This is noticed by Element web (app.element.io) saying the server is offline. The console logs show sync endpoints timing out.
Restarting the docker container has caused the issue to resolve on both times.
Nothing interesting found in logs that I could spot, errors mainly related to fetching of remote auth events. Will submit a log file privately that covers the second time with an issue.
Dendrite is v0.3.4 post dev on hash bca2790c678887232266c5726be79b80ddd9930b (which is a part of https://github.com/matrix-org/dendrite/pull/1672). MSC2836 is enabled in settings.
While the unresponsiveness continues, the logs do show activity in the federation API's.