Closed Kariton closed 1 day ago
when the container crashes the clients disconnect with "server not responding" despite the fact that the game server logs do look good.
It doesn't just sometimes crash, it crashes within 10 minutes which makes it not playable.
Related: https://github.com/SQLJames/factorio-server-charts/issues/62
@Kariton @samip5 It's probably a health check issue. Isn't the health check failing because it's 127.0.0.1?
There were no problems with chart-based server operation in the past (I did not include a health check), and the docker-based server I recently started operating has been running without problems for more than 3 days.
healthcheck:
test: curl --fail pingpong:34197/health || exit 1
interval: 20s
retries: 5
start_period: 20s
timeout: 10s
hmpf. absolutely. you cannot listen on localhost and expect livenessProbes to work on a "public" site... the port_fixer should listen to "0.0.0.0" and then work as expected.
nice catch!
gonna test, confirm and patch ASAP.
well. the livenessProbe check seems to be the problem - not the listen IP.
Thu, Nov 7 2024 3:08:42 pm {"level":"info","ts":1730988522.3731894,"caller":"cmd/local.go:115","msg":"Healthcheck server started"}
Thu, Nov 7 2024 3:08:42 pm {"level":"info","ts":1730988522.3731928,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:08:42 pm ⇨ http server started on [::]:34197
Thu, Nov 7 2024 3:08:55 pm {"level":"info","ts":1730988535.616322,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
Thu, Nov 7 2024 3:08:55 pm {"level":"info","ts":1730988535.6163716,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:08:56 pm {"level":"info","ts":1730988536.461232,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:09:01 pm {"level":"error","ts":1730988541.462053,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:58875: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:09:06 pm {"level":"info","ts":1730988546.460379,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:09:10 pm {"level":"info","ts":1730988550.6329925,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
Thu, Nov 7 2024 3:09:10 pm {"level":"info","ts":1730988550.6330307,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:09:11 pm {"level":"error","ts":1730988551.4609427,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:60609: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:09:16 pm {"level":"info","ts":1730988556.4605217,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:09:17 pm {"level":"error","ts":1730988557.5172606,"caller":"cmd/local.go:132","msg":"net.ReadFromUDP() error: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.3\n\t/app/cmd/local.go:132\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75"}
Thu, Nov 7 2024 3:09:21 pm {"level":"error","ts":1730988561.4610858,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:36376: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:09:21 pm {"level":"info","ts":1730988561.7004497,"caller":"cmd/local.go:190","msg":"graceful shutting down"}
Thu, Nov 7 2024 3:09:21 pm {"level":"error","ts":1730988561.7005007,"caller":"cmd/local.go:195","msg":"exit reason: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1\n\t/app/cmd/local.go:195\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/zcube/factorio-port-fixer/cmd.Execute\n\t/app/cmd/root.go:31\nmain.main\n\t/app/main.go:10\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
periodSeconds: 10
initialDelaySeconds: 5
failureThreshold: 3
the container gets killed.
now udp [::]:34197
looks like an ipv6 problem to me. but i dont have dualstack enabled...
is ReadFromUDP
actually related to http probes?
Thu, Nov 7 2024 3:37:14 pm ⇨ http server started on [::]:34197
Thu, Nov 7 2024 3:37:14 pm {"level":"info","ts":1730990234.3001275,"caller":"cmd/local.go:115","msg":"Healthcheck server started"}
Thu, Nov 7 2024 3:37:14 pm {"level":"info","ts":1730990234.3001416,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:37:27 pm {"level":"info","ts":1730990247.8176143,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
Thu, Nov 7 2024 3:37:27 pm {"level":"info","ts":1730990247.8176613,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:37:28 pm {"level":"info","ts":1730990248.2562778,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:37:33 pm {"level":"error","ts":1730990253.256902,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:55627: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:37:38 pm {"level":"info","ts":1730990258.256299,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:37:42 pm {"level":"info","ts":1730990262.8342316,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
Thu, Nov 7 2024 3:37:42 pm {"level":"info","ts":1730990262.8342786,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:37:43 pm {"level":"error","ts":1730990263.2572982,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:39508: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:37:48 pm {"level":"info","ts":1730990268.256682,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:37:53 pm {"level":"error","ts":1730990273.2570317,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:44050: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:37:53 pm {"level":"error","ts":1730990273.2741623,"caller":"cmd/local.go:132","msg":"net.ReadFromUDP() error: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.3\n\t/app/cmd/local.go:132\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75"}
Thu, Nov 7 2024 3:37:53 pm {"level":"info","ts":1730990273.2743487,"caller":"cmd/local.go:190","msg":"graceful shutting down"}
Thu, Nov 7 2024 3:37:53 pm {"level":"error","ts":1730990273.2743661,"caller":"cmd/local.go:195","msg":"exit reason: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1\n\t/app/cmd/local.go:195\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/zcube/factorio-port-fixer/cmd.Execute\n\t/app/cmd/root.go:31\nmain.main\n\t/app/main.go:10\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: port-fixer
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
ports:
- containerPort: 34197
name: port-fixer
protocol: TCP
Actually, I don't think that CAN be HTTP as that's TCP, no?
unless http v3 (quic), yes. http is tcp.
im unable to figure out what exactly is going on.
removing the probe entirely is working. greater thresholds or delays do not help.
to me it looks like an issue in the /health
endpoint or something related.
is there a way to get debug logging going?
Liveness probe failed: Get "http://10.41.3.219:34197/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
root@factorio-server-7c8497fdc-srv9l:/# curl 127.0.0.1:34197/health
OKroot@factorio-server-7c8497fdc-srv9l:/#
root@factorio-server-7c8497fdc-srv9l:/# curl 10.41.3.77:34197/health # pod IP
OKroot@factorio-server-7c8497fdc-srv9l:/
i have disabled the livenessProbe upstream. i dont understand why that is a problem.
from within the factorio container it works with listener 127.0.0.1 (i think i have tested that before hardcoding it in the chart...)
port-fixer:
- args:
- local
- '--ip=127.0.0.1'
- '--port=34197'
- '--remotePort=31497'
{"level":"info","ts":1730994447.8460875,"caller":"cmd/local.go:115","msg":"Healthcheck server started"} {"level":"info","ts":1730994447.8460903,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"127.0.0.1","Port":34197} ⇨ http server started on [::]:34197
probe test:
root@factorio-server-5bcbf95db8-s4gft:/# curl 127.0.0.1:34197/health
OKroot@factorio-server-5bcbf95db8-s4gft:/# curl 10.41.3.152:34197/health # pod IP
OKroot@factorio-server-5bcbf95db8-s4gft:/#
so listening to 0.0.0.0 or 127.0.0.1 does not make a difference here. re-tested both and neither of them worked as expected. higher timeouts / thresholds / delay did not help. but when deploying rcon-api within the same pod that probe does work.
@Kariton I finally remembered. The /health API was an API for checking the health of the Factorio server, not factorio-port-fixer.
would you mind adding a "correct" /healthz
endpoint?
would you mind adding a "correct"
/healthz
endpoint?
use /health
services:
pingpong:
image: ghcr.io/zcube/factorio-port-fixer:main
command: /factorio-port-fixer local --ip=0.0.0.0 --port=34197 --remotePort=${PORT:-34197}
healthcheck:
test: curl --fail 127.0.0.1:34197/health || exit 1
factorio:
image: factoriotools/factorio:stable
environment:
- PORT=${PORT:-34197}
healthcheck:
test: curl --fail pingpong:34197/health_for_factorio || exit 1
I am closing this as I believe it has been resolved in v1.0.4.
Hey, i recently added your container to the helm chart SQLJames/factorio-server-charts
while the container works as intended it sometimes crashes: