Open jstark1 opened 5 years ago
Does it occur with the currently nightly as well?
I tried version mod_gearman_worker - version 1.1.1 (Build: 3.01.20190412-labs-edition-v1.1.1)
with the same result.
anything in the worker log?
the worker logs look normal to me. i can provide you a log per PM if you want. As far as I understand the worker creates his worker_<hostname>
queue at daemon start. Is there a maintenance process for the worker-queue if it gets removed for the gearmand in case of an daemon restart or an network outage? Or does the gearmand remove queues without any workers?
i think the main problem is that under some circumstances the "Worker Available" in the worker_<hostname>
changed from 1 to 0 and comes only with an Worker restart back.
From the mod-gearman docs https://github.com/sni/mod_gearman#how-to i learned to use ./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check
.
With my check form above i do not use the -s check
so i only check the Available worker in the queue.
I will give this command a try so the worker_queue has some work to do.
i can now reproduce the Problem.
The worker starts and makes some connections to the gearmand server. I assume these are the connection to the different queues. One of the connection idles and get terminated after 1 hour trough our firewall. After another 1-2 hours the gearmand sets "Worker Available" in the worker_
If i use the./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check
command, the idle connection counter is reset when the command has been run (5 minute check interval).
sounds reasonable. Thanks for the heads up. So is there anything from the worker side we could do? As far as i remember, the "old" worker renews the status worker from time to time which probably prevents this issue from happening. I guess that would be ok for the go worker as well.
What is happening to me on CentOS8 is that mod_gearman go will do a denial of service on gearmand, with connections building up till it reaches 1000 open TCP connections and gearmand will not accept connections anymore. I changed it back to the C version and this is not happening with it. Strangely enough, on another installation I can not reproduce this problem with the same versions installed.
C Version Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:29:22 2021 6
Go Version - keeps going UP Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:41:41 2021 236 Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:43:27 2021 272 Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 10:22:52 2021 969
Tks.
i guess that's something else. Do the worker have a reasonable limit of connections/threads (max-worker
)? You could also send a SIGUSR1 to the worker process to create a thread dump.
No problem, will do it now, the conf is like this:
job_timeout=60 min-worker=5 max-worker=50 idle-timeout=30 max-jobs=1000 spawn-rate=1 fork_on_exec=no
`[2021-06-21 15:40:19.095][Error][mod_gearman_worker_linux.go:29] requested thread dump via signal user defined signal 1 [2021-06-21 15:40:19.096][Error][mod_gearman_worker.go:315] threaddump: goroutine 9 [running]: github.com/ConSol/mod-gearman-worker-go.logThreaddump() /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:311 +0x6f github.com/ConSol/mod-gearman-worker-go.mainSignalHandler(0x9f2520, 0xc99230, 0x3) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker_linux.go:30 +0x2d5 github.com/ConSol/mod-gearman-worker-go.Worker.func1(0xc000064f60) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:89 +0x77 created by github.com/ConSol/mod-gearman-worker-go.Worker /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:85 +0x1e7
goroutine 1 [select]: github.com/ConSol/mod-gearman-worker-go.mainLoop(0xc000136900, 0xc0000652c0, 0xc0001e9ee0, 0x0, 0x9811b8, 0x0, 0x0) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:155 +0x6a5 github.com/ConSol/mod-gearman-worker-go.Worker(0x0, 0x0) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:105 +0x279 main.main() /root/go/src/github.com/ConSol/mod-gearman-worker-go/cmd/mod_gearman_worker/main.go:12 +0x39
goroutine 19 [chan receive]: github.com/appscode/g2/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0xcd0dc0) /root/go/src/github.com/appscode/g2/vendor/github.com/golang/glog/glog.go:879 +0x8b created by github.com/appscode/g2/vendor/github.com/golang/glog.init.0 /root/go/src/github.com/appscode/g2/vendor/github.com/golang/glog/glog.go:410 +0x274
goroutine 8 [syscall]: os/signal.signal_recv(0x9f2520) /usr/local/go/src/runtime/sigqueue.go:147 +0x9d os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x25 created by os/signal.Notify.func1.1 /usr/local/go/src/os/signal/signal.go:150 +0x45
goroutine 36 [sleep]: time.Sleep(0xb2d05e00) /usr/local/go/src/runtime/time.go:188 +0xbf github.com/ConSol/mod-gearman-worker-go.mainLoop.func2(0xc0000223f0) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:147 +0x53 created by github.com/ConSol/mod-gearman-worker-go.mainLoop /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:141 +0x599
goroutine 20 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3ada0, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0001b4898, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0001b4880, 0xc0000d8000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc0001b4880, 0xc0000d8000, 0x1000, 0x1000, 0x1000, 0x7ff38b0347d0, 0xc0000d8000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc0000b6030, 0xc0000d8000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc0000b81e0, 0xc0000d8000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc00006ee60, 0xc000216f00, 0x40, 0xc0000b8360, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc00006ee60) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227
goroutine 21 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc0001b4780) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc0001b4780) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2
goroutine 50 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3abd0, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0000d2118, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0000d2100, 0xc000244000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc0000d2100, 0xc000244000, 0x1000, 0x1000, 0x1000, 0x7ff38b034e98, 0xc000244000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00021e000, 0xc000244000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc00020e0c0, 0xc000244000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc0000c40f0, 0xc000036f00, 0x40, 0xc00020e480, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc0000c40f0) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227
goroutine 51 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc0000d2000) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc0000d2000) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2
goroutine 52 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3aae8, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000236118, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000236100, 0xc0000da000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc000236100, 0xc0000da000, 0x1000, 0x1000, 0x1000, 0x7ff38b0347d0, 0xc0000da000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00021e010, 0xc0000da000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc00020e1e0, 0xc0000da000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc000234050, 0xc000212f00, 0x40, 0xc0000b8420, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc000234050) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227
goroutine 53 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc000236000) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236000) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2
goroutine 54 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3aa00, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000236298, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000236280, 0xc000241000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc000236280, 0xc000241000, 0x1000, 0x1000, 0x1000, 0x7ff38b034e98, 0xc000241000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00021e020, 0xc000241000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc00020e300, 0xc000241000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc0002340f0, 0xc000217f00, 0x40, 0xc00020e420, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc0002340f0) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227
goroutine 55 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc000236180) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236180) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2
goroutine 66 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3a918, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000236418, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000236400, 0xc0002a4000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc000236400, 0xc0002a4000, 0x1000, 0x1000, 0x1000, 0x7ff38b035560, 0xc0002a4000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00028a000, 0xc0002a4000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc0002840c0, 0xc0002a4000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc000234190, 0xc000214f00, 0x40, 0xc000284480, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc000234190) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227
goroutine 67 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc000236300) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236300) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2
goroutine 68 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3a830, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc00029a118, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc00029a100, 0xc0002a5000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc00029a100, 0xc0002a5000, 0x1000, 0x1000, 0x1000, 0x7ff38b035560, 0xc0002a5000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00028a010, 0xc0002a5000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc000284240, 0xc0002a5000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc000296050, 0xc000228700, 0x40, 0xc0002844e0, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc000296050) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227
goroutine 69 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc00029a000) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc00029a000) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2 `
we are using mod-gearman-worker-go with OMD 2.90 (also with 3.00). With the command
check_gearman -H localhost:4730 -q worker_<hostname> -x
we check the worker queues. Some workers lose the connection to their worker queue after some time.6 did not help in this case.
When we restart the gearman_worker, it immediately reconnects. The normal services queues are working as expected