Reported on the fricitonlessdata gitter channel. The symptom is that jobs created via the API (eg in try.goodtables.io or openspending.org/packager/provide-data are never finished, and the client keeps making requests to get an update until you get the following error:
Error: Reached Maximum Attempts
Looking at the pods and the logs, the following from the worker container points to the problem:
kubectl -n production logs goodtables-io-production-xxxxx-yyyy -c goodtables-worker -f
[2019-07-26 09:49:37,169: ERROR/MainProcess] Process 'ForkPoolWorker-1' pid:19 exited with 'signal 9 (SIGKILL)'
[2019-07-26 09:49:37,184: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
[2019-07-26 09:50:34,453: ERROR/ForkPoolWorker-6] There was an error setting the GitHub status: https://api.github.com/repos/bzg/data-codes-sources-fr/statuses/42f134bb64848b2df3e0318822805fc24b45afd1 {"message":"Not Found","documentation_url":"https://developer.github.com/v3/repos/statuses/#create-a-status"} 406 success
[2019-07-26 09:52:53,348: ERROR/MainProcess] Process 'ForkPoolWorker-6' pid:566 exited with 'signal 9 (SIGKILL)'
[2019-07-26 09:52:53,362: ERROR/MainProcess] Task handler raised error: WorkerLostError('Worker exited prematurely: signal 9 (SIGKILL).',)
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/billiard/pool.py", line 1223, in mark_as_worker_lost
human_status(exitcode)),
billiard.exceptions.WorkerLostError: Worker exited prematurely: signal 9 (SIGKILL).
^[[A[2019-07-26 10:17:58,438: WARNING/MainProcess] consumer: Connection to broker lost. Trying to re-establish the connection...
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 129, in _read_from_socket
raise socket.error(SERVER_CLOSED_CONNECTION_ERROR)
OSError: Connection closed by server.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 2165, in _execute
return command(*args)
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 577, in read_response
response = self._parser.read_response()
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 238, in read_response
response = self._buffer.readline()
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 168, in readline
self._read_from_socket()
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 143, in _read_from_socket
(e.args,))
redis.exceptions.ConnectionError: Error while reading from socket: ('Connection closed by server.',)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 439, in connect
sock = self._connect()
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 494, in _connect
raise err
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 482, in _connect
sock.connect(socket_address)
TimeoutError: [Errno 110] Operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 318, in start
blueprint.start(self)
File "/usr/local/lib/python3.5/site-packages/celery/bootsteps.py", line 119, in start
step.start(parent)
File "/usr/local/lib/python3.5/site-packages/celery/worker/consumer/consumer.py", line 584, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.5/site-packages/celery/worker/loops.py", line 88, in asynloop
next(loop)
File "/usr/local/lib/python3.5/site-packages/kombu/async/hub.py", line 354, in create_loop
cb(*cbargs)
File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 1040, in on_readable
self.cycle.on_readable(fileno)
File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 337, in on_readable
chan.handlers[type]()
File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 667, in _receive
ret.append(self._receive_one(c))
File "/usr/local/lib/python3.5/site-packages/kombu/transport/redis.py", line 678, in _receive_one
response = c.parse_response()
File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 2183, in parse_response
return self._execute(connection, connection.read_response)
File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 2172, in _execute
connection.connect()
File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 442, in connect
raise ConnectionError(self._error_message(e))
redis.exceptions.ConnectionError: Error 110 connecting to goodtables-io-production-redis-master:6379. Operation timed out.
Previous to this there are some errors about files not found, not sure if they are related:
[2019-07-26 09:41:28,823: ERROR/ForkPoolWorker-4] Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, 'No such file or directory')
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/celery/utils/dispatch/signal.py", line 167, in send
response = receiver(signal=self, sender=sender, **named)
File "/srv/app/goodtablesio/signals.py", line 40, in task_postrun
os.remove(path)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptduxzzma/invalid.csv'
[2019-07-26 09:41:28,835: ERROR/ForkPoolWorker-4] ["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, 'No such file or directory')", ' File "celery/utils/dispatch/signal.py", line 167, in send', ' File "goodtablesio/signals.py", line 40, in task_postrun']
[2019-07-26 09:41:28,835: WARNING/ForkPoolWorker-4] b'["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, \'No such file or directory\')", \' File "celery/utils/dispatch/signal.py", line 167, in send\', \' File "goodtablesio/signals.py", line 40, in task_postrun\']'
[2019-07-26 09:41:28,919: ERROR/ForkPoolWorker-1] Sentry responded with an API error: RateLimited(Creation of this event was denied due to rate limiting)
[2019-07-26 09:41:28,919: WARNING/ForkPoolWorker-1] b'Sentry responded with an API error: RateLimited(Creation of this event was denied due to rate limiting)'
[2019-07-26 09:41:28,920: ERROR/ForkPoolWorker-1] ["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, 'No such file or directory')", ' File "celery/utils/dispatch/signal.py", line 167, in send', ' File "goodtablesio/signals.py", line 40, in task_postrun']
[2019-07-26 09:41:28,920: WARNING/ForkPoolWorker-1] b'["Signal handler <function task_postrun at 0x7f6dfbabbf28> raised: FileNotFoundError(2, \'No such file or directory\')", \' File "celery/utils/dispatch/signal.py", line 167, in send\', \' File "goodtablesio/signals.py", line 40, in task_postrun\']'
Update: The issue seems to be fixed, I restarted both pods, the app + worker one and the redis one, using the command in #317
Reported on the fricitonlessdata gitter channel. The symptom is that jobs created via the API (eg in try.goodtables.io or openspending.org/packager/provide-data are never finished, and the client keeps making requests to get an update until you get the following error:
Looking at the pods and the logs, the following from the worker container points to the problem:
Previous to this there are some errors about files not found, not sure if they are related:
Update: The issue seems to be fixed, I restarted both pods, the app + worker one and the redis one, using the command in #317