ConSol-Monitoring / mod-gearman-worker-go

Mod-Gearman Worker rewrite in Golang
GNU General Public License v3.0
7 stars 10 forks source link

worker available is lost after a while #9

Open jstark1 opened 5 years ago

jstark1 commented 5 years ago

we are using mod-gearman-worker-go with OMD 2.90 (also with 3.00). With the command check_gearman -H localhost:4730 -q worker_<hostname> -x we check the worker queues. Some workers lose the connection to their worker queue after some time.

6 did not help in this case.

When we restart the gearman_worker, it immediately reconnects. The normal services queues are working as expected

sni commented 5 years ago

Does it occur with the currently nightly as well?

jstark1 commented 5 years ago

I tried version mod_gearman_worker - version 1.1.1 (Build: 3.01.20190412-labs-edition-v1.1.1) with the same result.

sni commented 5 years ago

anything in the worker log?

jstark1 commented 5 years ago

the worker logs look normal to me. i can provide you a log per PM if you want. As far as I understand the worker creates his worker_<hostname> queue at daemon start. Is there a maintenance process for the worker-queue if it gets removed for the gearmand in case of an daemon restart or an network outage? Or does the gearmand remove queues without any workers?

i think the main problem is that under some circumstances the "Worker Available" in the worker_<hostname> changed from 1 to 0 and comes only with an Worker restart back.

From the mod-gearman docs https://github.com/sni/mod_gearman#how-to i learned to use ./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check. With my check form above i do not use the -s check so i only check the Available worker in the queue. I will give this command a try so the worker_queue has some work to do.

jstark1 commented 5 years ago

i can now reproduce the Problem. The worker starts and makes some connections to the gearmand server. I assume these are the connection to the different queues. One of the connection idles and get terminated after 1 hour trough our firewall. After another 1-2 hours the gearmand sets "Worker Available" in the worker_ queue to zero.

If i use the./check_gearman -H <job server hostname> -q worker_<worker hostname> -t 10 -s check command, the idle connection counter is reset when the command has been run (5 minute check interval).

sni commented 5 years ago

sounds reasonable. Thanks for the heads up. So is there anything from the worker side we could do? As far as i remember, the "old" worker renews the status worker from time to time which probably prevents this issue from happening. I guess that would be ok for the go worker as well.

dirtyren commented 3 years ago

What is happening to me on CentOS8 is that mod_gearman go will do a denial of service on gearmand, with connections building up till it reaches 1000 open TCP connections and gearmand will not accept connections anymore. I changed it back to the C version and this is not happening with it. Strangely enough, on another installation I can not reproduce this problem with the same versions installed.

C Version Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:29:22 2021 6

Go Version - keeps going UP Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:41:41 2021 236 Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 09:43:27 2021 272 Every 2.0s: netstat -anp | grep mod_gearman | wc -l opmoncloud8: Mon Jun 21 10:22:52 2021 969

Tks.

sni commented 3 years ago

i guess that's something else. Do the worker have a reasonable limit of connections/threads (max-worker)? You could also send a SIGUSR1 to the worker process to create a thread dump.

dirtyren commented 3 years ago

No problem, will do it now, the conf is like this:

job_timeout=60 min-worker=5 max-worker=50 idle-timeout=30 max-jobs=1000 spawn-rate=1 fork_on_exec=no

`[2021-06-21 15:40:19.095][Error][mod_gearman_worker_linux.go:29] requested thread dump via signal user defined signal 1 [2021-06-21 15:40:19.096][Error][mod_gearman_worker.go:315] threaddump: goroutine 9 [running]: github.com/ConSol/mod-gearman-worker-go.logThreaddump() /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:311 +0x6f github.com/ConSol/mod-gearman-worker-go.mainSignalHandler(0x9f2520, 0xc99230, 0x3) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker_linux.go:30 +0x2d5 github.com/ConSol/mod-gearman-worker-go.Worker.func1(0xc000064f60) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:89 +0x77 created by github.com/ConSol/mod-gearman-worker-go.Worker /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:85 +0x1e7

goroutine 1 [select]: github.com/ConSol/mod-gearman-worker-go.mainLoop(0xc000136900, 0xc0000652c0, 0xc0001e9ee0, 0x0, 0x9811b8, 0x0, 0x0) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:155 +0x6a5 github.com/ConSol/mod-gearman-worker-go.Worker(0x0, 0x0) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:105 +0x279 main.main() /root/go/src/github.com/ConSol/mod-gearman-worker-go/cmd/mod_gearman_worker/main.go:12 +0x39

goroutine 19 [chan receive]: github.com/appscode/g2/vendor/github.com/golang/glog.(*loggingT).flushDaemon(0xcd0dc0) /root/go/src/github.com/appscode/g2/vendor/github.com/golang/glog/glog.go:879 +0x8b created by github.com/appscode/g2/vendor/github.com/golang/glog.init.0 /root/go/src/github.com/appscode/g2/vendor/github.com/golang/glog/glog.go:410 +0x274

goroutine 8 [syscall]: os/signal.signal_recv(0x9f2520) /usr/local/go/src/runtime/sigqueue.go:147 +0x9d os/signal.loop() /usr/local/go/src/os/signal/signal_unix.go:23 +0x25 created by os/signal.Notify.func1.1 /usr/local/go/src/os/signal/signal.go:150 +0x45

goroutine 36 [sleep]: time.Sleep(0xb2d05e00) /usr/local/go/src/runtime/time.go:188 +0xbf github.com/ConSol/mod-gearman-worker-go.mainLoop.func2(0xc0000223f0) /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:147 +0x53 created by github.com/ConSol/mod-gearman-worker-go.mainLoop /root/go/src/github.com/ConSol/mod-gearman-worker-go/mod_gearman_worker.go:141 +0x599

goroutine 20 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3ada0, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0001b4898, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0001b4880, 0xc0000d8000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc0001b4880, 0xc0000d8000, 0x1000, 0x1000, 0x1000, 0x7ff38b0347d0, 0xc0000d8000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc0000b6030, 0xc0000d8000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc0000b81e0, 0xc0000d8000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc00006ee60, 0xc000216f00, 0x40, 0xc0000b8360, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc00006ee60) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 21 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc0001b4780) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc0001b4780) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 50 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3abd0, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc0000d2118, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc0000d2100, 0xc000244000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc0000d2100, 0xc000244000, 0x1000, 0x1000, 0x1000, 0x7ff38b034e98, 0xc000244000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00021e000, 0xc000244000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc00020e0c0, 0xc000244000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc0000c40f0, 0xc000036f00, 0x40, 0xc00020e480, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc0000c40f0) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 51 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc0000d2000) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc0000d2000) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 52 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3aae8, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000236118, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000236100, 0xc0000da000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc000236100, 0xc0000da000, 0x1000, 0x1000, 0x1000, 0x7ff38b0347d0, 0xc0000da000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00021e010, 0xc0000da000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc00020e1e0, 0xc0000da000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc000234050, 0xc000212f00, 0x40, 0xc0000b8420, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc000234050) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 53 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc000236000) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236000) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 54 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3aa00, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000236298, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000236280, 0xc000241000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc000236280, 0xc000241000, 0x1000, 0x1000, 0x1000, 0x7ff38b034e98, 0xc000241000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00021e020, 0xc000241000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc00020e300, 0xc000241000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc0002340f0, 0xc000217f00, 0x40, 0xc00020e420, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc0002340f0) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 55 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc000236180) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236180) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 66 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3a918, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc000236418, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc000236400, 0xc0002a4000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc000236400, 0xc0002a4000, 0x1000, 0x1000, 0x1000, 0x7ff38b035560, 0xc0002a4000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00028a000, 0xc0002a4000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc0002840c0, 0xc0002a4000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc000234190, 0xc000214f00, 0x40, 0xc000284480, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc000234190) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 67 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc000236300) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc000236300) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2

goroutine 68 [IO wait]: internal/poll.runtime_pollWait(0x7ff360a3a830, 0x72, 0x9eea40) /usr/local/go/src/runtime/netpoll.go:222 +0x55 internal/poll.(pollDesc).wait(0xc00029a118, 0x72, 0x9eea00, 0xc81698, 0x0) /usr/local/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(pollDesc).waitRead(...) /usr/local/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(FD).Read(0xc00029a100, 0xc0002a5000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/internal/poll/fd_unix.go:159 +0x1a5 net.(netFD).Read(0xc00029a100, 0xc0002a5000, 0x1000, 0x1000, 0x1000, 0x7ff38b035560, 0xc0002a5000) /usr/local/go/src/net/fd_posix.go:55 +0x4f net.(conn).Read(0xc00028a010, 0xc0002a5000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/go/src/net/net.go:182 +0x8e bufio.(Reader).Read(0xc000284240, 0xc0002a5000, 0x1000, 0x1000, 0xc, 0x0, 0x0) /usr/local/go/src/bufio/bufio.go:213 +0x142 github.com/appscode/g2/worker.(agent).read(0xc000296050, 0xc000228700, 0x40, 0xc0002844e0, 0xc, 0x0) /root/go/src/github.com/appscode/g2/worker/agent.go:182 +0xa5 github.com/appscode/g2/worker.(agent).work(0xc000296050) /root/go/src/github.com/appscode/g2/worker/agent.go:61 +0xc7 created by github.com/appscode/g2/worker.(*agent).Connect /root/go/src/github.com/appscode/g2/worker/agent.go:44 +0x227

goroutine 69 [chan receive]: github.com/appscode/g2/worker.(*Worker).Work(0xc00029a000) /root/go/src/github.com/appscode/g2/worker/worker.go:220 +0xc6 github.com/ConSol/mod-gearman-worker-go.newWorker.func2(0xc00029a000) /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:72 +0x4d created by github.com/ConSol/mod-gearman-worker-go.newWorker /root/go/src/github.com/ConSol/mod-gearman-worker-go/worker.go:70 +0x5d2 `