gocraft / work

Process background jobs in Go
MIT License
2.42k stars 342 forks source link

Worker process hangs after a TCP timeout error #166

Open navaneeth opened 4 years ago

navaneeth commented 4 years ago

We are using Work in our project. It works really well. Thanks for the amazing library.

Every 24hours or so, we are facing an issue where the worker process just hangs. When I looked at the Gocraftweb UI, I can see there were no heartbeat signals sent. I could see the following error message in the logs:

ERROR: periodic_enqueuer.loop.enqueue - dial tcp: lookup <redis-address> on 172.31.0.2:53: dial udp 172.31.0.2:53: i/o timeout
ERROR: dead_pool_reaper.reap - dial tcp: lookup <redis-address> on 172.31.0.2:53: dial udp 172.31.0.2:53: i/o timeout
ERROR: requeuer.process - dial tcp: lookup <redis-address> on 172.31.0.2:53: dial udp 172.31.0.2:53: i/o timeout
request expired, resigning

I was expecting the work will resume from such errors. Looks like it is stuck. Everytime I restart the process manually and it will work. Any idea to fix this would be great.

navaneeth commented 4 years ago

I could also see the following errors:

2020-08-30T20:57:36.258+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
2
2020-08-30T20:55:21.658+05:30
ERROR: periodic_enqueuer.should_enqueue - redigo: connection pool exhausted
3
2020-08-30T20:46:45.664+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
4
2020-08-30T20:41:06.942+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
5
2020-08-30T20:39:09.492+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
6
2020-08-30T20:37:54.741+05:30
ERROR: periodic_enqueuer.loop.enqueue - redigo: connection pool exhausted
7
2020-08-30T20:33:41.050+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
8
2020-08-30T20:32:51.060+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
9
2020-08-30T20:28:28.948+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
10
2020-08-30T20:27:14.288+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
11
2020-08-30T20:10:01.804+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
12
2020-08-30T20:02:46.910+05:30
ERROR: worker.fetch - redigo: connection pool exhausted
13
2020-08-30T20:01:00.409+05:30
ERROR: worker.fetch - redigo: connection pool exhausted