ZCube / factorio-port-fixer

for Factorio headless server.
MIT License
3 stars 0 forks source link

use of closed network connection #1

Closed Kariton closed 1 day ago

Kariton commented 1 week ago

Hey, i recently added your container to the helm chart SQLJames/factorio-server-charts

while the container works as intended it sometimes crashes:

{"level":"info","ts":1730878872.1065865,"caller":"cmd/local.go:115","msg":"Healthcheck server started"}
⇨ http server started on [::]:34197
{"level":"info","ts":1730878872.106785,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"127.0.0.1","Port":34197}
{"level":"info","ts":1730878877.758352,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
{"level":"info","ts":1730878880.5381744,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
{"level":"info","ts":1730878880.5382175,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"127.0.0.1","Port":34197}
{"level":"error","ts":1730878882.7584276,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp 127.0.0.1:60707: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
{"level":"info","ts":1730878887.7583582,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
{"level":"error","ts":1730878892.7618413,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp 127.0.0.1:50433: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
{"level":"info","ts":1730878895.554898,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
{"level":"info","ts":1730878895.5549371,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"127.0.0.1","Port":34197}
{"level":"info","ts":1730878897.7583709,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
{"level":"error","ts":1730878898.8152833,"caller":"cmd/local.go:132","msg":"net.ReadFromUDP() error: read udp 127.0.0.1:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.3\n\t/app/cmd/local.go:132\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75"}
{"level":"error","ts":1730878902.7591352,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp 127.0.0.1:55703: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
{"level":"info","ts":1730878903.0005345,"caller":"cmd/local.go:190","msg":"graceful shutting down"}
{"level":"error","ts":1730878903.000601,"caller":"cmd/local.go:195","msg":"exit reason: read udp 127.0.0.1:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1\n\t/app/cmd/local.go:195\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/zcube/factorio-port-fixer/cmd.Execute\n\t/app/cmd/root.go:31\nmain.main\n\t/app/main.go:10\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
Kariton commented 1 week ago

when the container crashes the clients disconnect with "server not responding" despite the fact that the game server logs do look good.

samip5 commented 1 week ago

It doesn't just sometimes crash, it crashes within 10 minutes which makes it not playable.

Related: https://github.com/SQLJames/factorio-server-charts/issues/62

ZCube commented 1 week ago

@Kariton @samip5 It's probably a health check issue. Isn't the health check failing because it's 127.0.0.1?

There were no problems with chart-based server operation in the past (I did not include a health check), and the docker-based server I recently started operating has been running without problems for more than 3 days.

    healthcheck:
      test: curl --fail pingpong:34197/health || exit 1
      interval: 20s
      retries: 5
      start_period: 20s
      timeout: 10s
Kariton commented 1 week ago

hmpf. absolutely. you cannot listen on localhost and expect livenessProbes to work on a "public" site... the port_fixer should listen to "0.0.0.0" and then work as expected.

nice catch!

gonna test, confirm and patch ASAP.

Kariton commented 6 days ago

well. the livenessProbe check seems to be the problem - not the listen IP.

Thu, Nov 7 2024 3:08:42 pm {"level":"info","ts":1730988522.3731894,"caller":"cmd/local.go:115","msg":"Healthcheck server started"}
Thu, Nov 7 2024 3:08:42 pm {"level":"info","ts":1730988522.3731928,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:08:42 pm ⇨ http server started on [::]:34197
Thu, Nov 7 2024 3:08:55 pm {"level":"info","ts":1730988535.616322,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
Thu, Nov 7 2024 3:08:55 pm {"level":"info","ts":1730988535.6163716,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:08:56 pm {"level":"info","ts":1730988536.461232,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:09:01 pm {"level":"error","ts":1730988541.462053,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:58875: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:09:06 pm {"level":"info","ts":1730988546.460379,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:09:10 pm {"level":"info","ts":1730988550.6329925,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"}
Thu, Nov 7 2024 3:09:10 pm {"level":"info","ts":1730988550.6330307,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197}
Thu, Nov 7 2024 3:09:11 pm {"level":"error","ts":1730988551.4609427,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:60609: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:09:16 pm {"level":"info","ts":1730988556.4605217,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"}
Thu, Nov 7 2024 3:09:17 pm {"level":"error","ts":1730988557.5172606,"caller":"cmd/local.go:132","msg":"net.ReadFromUDP() error: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.3\n\t/app/cmd/local.go:132\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75"}
Thu, Nov 7 2024 3:09:21 pm {"level":"error","ts":1730988561.4610858,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:36376: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"}
Thu, Nov 7 2024 3:09:21 pm {"level":"info","ts":1730988561.7004497,"caller":"cmd/local.go:190","msg":"graceful shutting down"}
Thu, Nov 7 2024 3:09:21 pm {"level":"error","ts":1730988561.7005007,"caller":"cmd/local.go:195","msg":"exit reason: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1\n\t/app/cmd/local.go:195\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/zcube/factorio-port-fixer/cmd.Execute\n\t/app/cmd/root.go:31\nmain.main\n\t/app/main.go:10\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
          periodSeconds: 10
          initialDelaySeconds: 5
          failureThreshold: 3

the container gets killed.

now udp [::]:34197 looks like an ipv6 problem to me. but i dont have dualstack enabled...

Kariton commented 6 days ago

is ReadFromUDP actually related to http probes?

Kariton commented 6 days ago
Thu, Nov 7 2024 3:37:14 pm ⇨ http server started on [::]:34197 
Thu, Nov 7 2024 3:37:14 pm {"level":"info","ts":1730990234.3001275,"caller":"cmd/local.go:115","msg":"Healthcheck server started"} 
Thu, Nov 7 2024 3:37:14 pm {"level":"info","ts":1730990234.3001416,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197} 
Thu, Nov 7 2024 3:37:27 pm {"level":"info","ts":1730990247.8176143,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"} 
Thu, Nov 7 2024 3:37:27 pm {"level":"info","ts":1730990247.8176613,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197} 
Thu, Nov 7 2024 3:37:28 pm {"level":"info","ts":1730990248.2562778,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"} 
Thu, Nov 7 2024 3:37:33 pm {"level":"error","ts":1730990253.256902,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:55627: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"} 
Thu, Nov 7 2024 3:37:38 pm {"level":"info","ts":1730990258.256299,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"} 
Thu, Nov 7 2024 3:37:42 pm {"level":"info","ts":1730990262.8342316,"caller":"cmd/local.go:135","msg":"Read from socket","Bytes":1,"Remote":"127.0.0.1:31497"} 
Thu, Nov 7 2024 3:37:42 pm {"level":"info","ts":1730990262.8342786,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"0.0.0.0","Port":34197} 
Thu, Nov 7 2024 3:37:43 pm {"level":"error","ts":1730990263.2572982,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:39508: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"} 
Thu, Nov 7 2024 3:37:48 pm {"level":"info","ts":1730990268.256682,"caller":"cmd/local.go:92","msg":"Wrote to socket","Bytes":3,"Remote":"10.41.3.197:31497"} 
Thu, Nov 7 2024 3:37:53 pm {"level":"error","ts":1730990273.2570317,"caller":"cmd/local.go:103","msg":"net.ReadFromUDP() error: %sread udp [::]:44050: i/o timeout","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.1\n\t/app/cmd/local.go:103\ngithub.com/labstack/echo/v4.(*Echo).add.func1\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:546\ngithub.com/labstack/echo/v4.(*Echo).ServeHTTP\n\t/go/pkg/mod/github.com/labstack/echo/v4@v4.10.0/echo.go:633\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2947\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1991"} 
Thu, Nov 7 2024 3:37:53 pm {"level":"error","ts":1730990273.2741623,"caller":"cmd/local.go:132","msg":"net.ReadFromUDP() error: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1.3\n\t/app/cmd/local.go:132\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.1.0/errgroup/errgroup.go:75"} 
Thu, Nov 7 2024 3:37:53 pm {"level":"info","ts":1730990273.2743487,"caller":"cmd/local.go:190","msg":"graceful shutting down"} 
Thu, Nov 7 2024 3:37:53 pm {"level":"error","ts":1730990273.2743661,"caller":"cmd/local.go:195","msg":"exit reason: read udp [::]:34197: use of closed network connection","stacktrace":"github.com/zcube/factorio-port-fixer/cmd.glob..func1\n\t/app/cmd/local.go:195\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:920\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\ngithub.com/zcube/factorio-port-fixer/cmd.Execute\n\t/app/cmd/root.go:31\nmain.main\n\t/app/main.go:10\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
      livenessProbe:
        failureThreshold: 3
        httpGet:
          path: /health
          port: port-fixer
          scheme: HTTP
        initialDelaySeconds: 10
        periodSeconds: 10
        successThreshold: 1
        timeoutSeconds: 5
      ports:
        - containerPort: 34197
          name: port-fixer
          protocol: TCP
samip5 commented 6 days ago

Actually, I don't think that CAN be HTTP as that's TCP, no?

Kariton commented 6 days ago

unless http v3 (quic), yes. http is tcp.

im unable to figure out what exactly is going on.

removing the probe entirely is working. greater thresholds or delays do not help.

to me it looks like an issue in the /health endpoint or something related. is there a way to get debug logging going?

Kariton commented 6 days ago

Liveness probe failed: Get "http://10.41.3.219:34197/health": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Kariton commented 6 days ago
root@factorio-server-7c8497fdc-srv9l:/# curl 127.0.0.1:34197/health 
OKroot@factorio-server-7c8497fdc-srv9l:/#
root@factorio-server-7c8497fdc-srv9l:/# curl 10.41.3.77:34197/health # pod IP
OKroot@factorio-server-7c8497fdc-srv9l:/
Kariton commented 6 days ago

i have disabled the livenessProbe upstream. i dont understand why that is a problem.

from within the factorio container it works with listener 127.0.0.1 (i think i have tested that before hardcoding it in the chart...)

port-fixer:

    - args:
        - local
        - '--ip=127.0.0.1'
        - '--port=34197'
        - '--remotePort=31497'

{"level":"info","ts":1730994447.8460875,"caller":"cmd/local.go:115","msg":"Healthcheck server started"} {"level":"info","ts":1730994447.8460903,"caller":"cmd/local.go:126","msg":"Accepting a new packet","IP":"127.0.0.1","Port":34197} ⇨ http server started on [::]:34197

probe test:

root@factorio-server-5bcbf95db8-s4gft:/# curl 127.0.0.1:34197/health 
OKroot@factorio-server-5bcbf95db8-s4gft:/# curl 10.41.3.152:34197/health # pod IP
OKroot@factorio-server-5bcbf95db8-s4gft:/# 

so listening to 0.0.0.0 or 127.0.0.1 does not make a difference here. re-tested both and neither of them worked as expected. higher timeouts / thresholds / delay did not help. but when deploying rcon-api within the same pod that probe does work.

ZCube commented 6 days ago

@Kariton I finally remembered. The /health API was an API for checking the health of the Factorio server, not factorio-port-fixer.

Kariton commented 6 days ago

would you mind adding a "correct" /healthz endpoint?

ZCube commented 6 days ago

would you mind adding a "correct" /healthz endpoint?

use /health

services:
  pingpong:
    image: ghcr.io/zcube/factorio-port-fixer:main
    command: /factorio-port-fixer local --ip=0.0.0.0 --port=34197 --remotePort=${PORT:-34197}
    healthcheck:
      test: curl --fail 127.0.0.1:34197/health || exit 1

  factorio:
    image: factoriotools/factorio:stable
    environment:
      - PORT=${PORT:-34197}
    healthcheck:
      test: curl --fail pingpong:34197/health_for_factorio || exit 1
ZCube commented 1 day ago

I am closing this as I believe it has been resolved in v1.0.4.