datalad / datalad-registry

MIT License
0 stars 2 forks source link

Celery worker dies #173

Closed candleindark closed 1 year ago

candleindark commented 1 year ago

The Celery worker dies whenever it is relatively idle. The logs just before the termination of the worker and the associated broker (the RabbitMQ service) are the following:

Logs of the Celery worker:

[2023-05-17 18:35:46,982: DEBUG/MainProcess] pidbox received method ping() [reply_to:{'exchange': 'reply.celery.pidbox', 'routing_key': '3310d923-b69e-3796-bc93-572f1606d24f'} ticket:6d5b5ef6-646b-4eca-9534-80c37233de9d]
[2023-05-17 18:35:51,238: DEBUG/MainProcess] Channel open
[2023-05-17 18:35:51,240: CRITICAL/MainProcess] Unrecoverable error: PreconditionFailed(406, 'PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more', (0, 0), '')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 362, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 235, in on_readable
    reader(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 217, in _read
    drain_events(timeout=0)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
    raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
[2023-05-17 18:35:51,248: DEBUG/MainProcess] | Worker: Closing Hub...
[2023-05-17 18:35:51,248: DEBUG/MainProcess] | Worker: Closing Pool...
[2023-05-17 18:35:51,249: DEBUG/MainProcess] | Worker: Closing Consumer...
[2023-05-17 18:35:51,250: DEBUG/MainProcess] | Worker: Stopping Consumer...
[2023-05-17 18:35:51,250: DEBUG/MainProcess] | Consumer: Closing Connection...
[2023-05-17 18:35:51,250: DEBUG/MainProcess] | Consumer: Closing Events...
[2023-05-17 18:35:51,250: DEBUG/MainProcess] | Consumer: Closing Mingle...
[2023-05-17 18:35:51,250: DEBUG/MainProcess] | Consumer: Closing Gossip...
[2023-05-17 18:35:51,251: DEBUG/MainProcess] | Consumer: Closing Tasks...
[2023-05-17 18:35:51,251: DEBUG/MainProcess] | Consumer: Closing Control...
[2023-05-17 18:35:51,251: DEBUG/MainProcess] | Consumer: Closing Heart...
[2023-05-17 18:35:51,251: DEBUG/MainProcess] | Consumer: Closing event loop...
[2023-05-17 18:35:51,251: DEBUG/MainProcess] | Consumer: Stopping event loop...
[2023-05-17 18:35:51,251: DEBUG/MainProcess] | Consumer: Stopping Heart...
[2023-05-17 18:35:51,253: DEBUG/MainProcess] | Consumer: Stopping Control...
[2023-05-17 18:35:51,255: DEBUG/MainProcess] Closed channel #3
[2023-05-17 18:35:51,256: DEBUG/MainProcess] | Consumer: Stopping Tasks...
[2023-05-17 18:35:51,256: DEBUG/MainProcess] Canceling task consumer...
[2023-05-17 18:35:51,257: DEBUG/MainProcess] | Consumer: Stopping Gossip...
[2023-05-17 18:35:51,264: DEBUG/MainProcess] Closed channel #2
[2023-05-17 18:35:51,265: DEBUG/MainProcess] | Consumer: Stopping Mingle...
[2023-05-17 18:35:51,265: DEBUG/MainProcess] | Consumer: Stopping Events...
[2023-05-17 18:35:51,265: DEBUG/MainProcess] | Consumer: Stopping Connection...
[2023-05-17 18:35:51,265: DEBUG/MainProcess] | Worker: Stopping Pool...
[2023-05-17 18:36:26,760: DEBUG/ForkPoolWorker-32] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@deb4152d60a1', 'copyright': 'Copyright (c) 2007-2023 VMware, Inc. or its affiliates.', 'information': 'Licensed under the MPL 2.0. Website: https://rabbitmq.com', 'platform': 'Erlang/OTP 25.2.3', 'product': 'RabbitMQ', 'version': '3.11.9'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
[2023-05-17 18:36:26,761: DEBUG/ForkPoolWorker-32] using channel_id: 1
[2023-05-17 18:36:26,762: DEBUG/ForkPoolWorker-32] Channel open
[2023-05-17 18:36:26,764: INFO/ForkPoolWorker-32] Task datalad_registry.tasks.process_dataset_url[e202c374-ee64-48ab-8f51-9eb610f4ab2e] succeeded in 1835.4301066901535s: None
[2023-05-17 18:36:28,071: DEBUG/MainProcess] | Worker: Stopping Hub...
[2023-05-17 18:36:28,072: DEBUG/MainProcess] | Consumer: Shutdown Heart...
[2023-05-17 18:36:28,072: DEBUG/MainProcess] | Consumer: Shutdown Control...
[2023-05-17 18:36:28,072: DEBUG/MainProcess] | Consumer: Shutdown Tasks...
[2023-05-17 18:36:28,073: DEBUG/MainProcess] Canceling task consumer...
[2023-05-17 18:36:28,073: DEBUG/MainProcess] Closing consumer channel...
[2023-05-17 18:36:28,073: DEBUG/MainProcess] | Consumer: Shutdown Gossip...
[2023-05-17 18:36:28,073: DEBUG/MainProcess] | Consumer: Shutdown Events...
[2023-05-17 18:36:28,074: DEBUG/MainProcess] Closed channel #1
[2023-05-17 18:36:28,075: DEBUG/MainProcess] | Consumer: Shutdown Connection...
[2023-05-17 18:36:28,075: DEBUG/MainProcess] Closed channel #1
[2023-05-17 18:36:28,076: WARNING/MainProcess] Traceback (most recent call last):
[2023-05-17 18:36:28,076: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
[2023-05-17 18:36:28,077: WARNING/MainProcess]
[2023-05-17 18:36:28,077: WARNING/MainProcess] self.blueprint.start(self)
[2023-05-17 18:36:28,077: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
[2023-05-17 18:36:28,077: WARNING/MainProcess]
[2023-05-17 18:36:28,078: WARNING/MainProcess] step.start(parent)
[2023-05-17 18:36:28,078: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
[2023-05-17 18:36:28,078: WARNING/MainProcess]
[2023-05-17 18:36:28,078: WARNING/MainProcess] return self.obj.start()
[2023-05-17 18:36:28,078: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
[2023-05-17 18:36:28,079: WARNING/MainProcess]
[2023-05-17 18:36:28,079: WARNING/MainProcess] blueprint.start(self)
[2023-05-17 18:36:28,079: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
[2023-05-17 18:36:28,079: WARNING/MainProcess]
[2023-05-17 18:36:28,079: WARNING/MainProcess] step.start(parent)
[2023-05-17 18:36:28,080: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
[2023-05-17 18:36:28,080: WARNING/MainProcess]
[2023-05-17 18:36:28,080: WARNING/MainProcess] c.loop(*c.loop_args())
[2023-05-17 18:36:28,080: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
[2023-05-17 18:36:28,081: WARNING/MainProcess]
[2023-05-17 18:36:28,081: WARNING/MainProcess] next(loop)
[2023-05-17 18:36:28,081: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 362, in create_loop
[2023-05-17 18:36:28,081: WARNING/MainProcess]
[2023-05-17 18:36:28,081: WARNING/MainProcess] cb(*cbargs)
[2023-05-17 18:36:28,081: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 235, in on_readable
[2023-05-17 18:36:28,082: WARNING/MainProcess]
[2023-05-17 18:36:28,082: WARNING/MainProcess] reader(loop)
[2023-05-17 18:36:28,082: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 217, in _read
[2023-05-17 18:36:28,082: WARNING/MainProcess]
[2023-05-17 18:36:28,082: WARNING/MainProcess] drain_events(timeout=0)
[2023-05-17 18:36:28,083: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
[2023-05-17 18:36:28,083: WARNING/MainProcess]
[2023-05-17 18:36:28,083: WARNING/MainProcess] while not self.blocking_read(timeout):
[2023-05-17 18:36:28,083: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
[2023-05-17 18:36:28,084: WARNING/MainProcess]
[2023-05-17 18:36:28,084: WARNING/MainProcess] return self.on_inbound_frame(frame)
[2023-05-17 18:36:28,084: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
[2023-05-17 18:36:28,084: WARNING/MainProcess]
[2023-05-17 18:36:28,084: WARNING/MainProcess] callback(channel, method_sig, buf, None)
[2023-05-17 18:36:28,084: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
[2023-05-17 18:36:28,085: WARNING/MainProcess]
[2023-05-17 18:36:28,085: WARNING/MainProcess] return self.channels[channel_id].dispatch_method(
[2023-05-17 18:36:28,085: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
[2023-05-17 18:36:28,085: WARNING/MainProcess]
[2023-05-17 18:36:28,086: WARNING/MainProcess] listener(*args)
[2023-05-17 18:36:28,086: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
[2023-05-17 18:36:28,086: WARNING/MainProcess]
[2023-05-17 18:36:28,086: WARNING/MainProcess] raise error_for_code(
[2023-05-17 18:36:28,086: WARNING/MainProcess] amqp.exceptions
[2023-05-17 18:36:28,086: WARNING/MainProcess] .
[2023-05-17 18:36:28,086: WARNING/MainProcess] PreconditionFailed
[2023-05-17 18:36:28,087: WARNING/MainProcess] :
[2023-05-17 18:36:28,087: WARNING/MainProcess] (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
[2023-05-17 18:36:28,087: WARNING/MainProcess]
During handling of the above exception, another exception occurred:
[2023-05-17 18:36:28,087: WARNING/MainProcess] Traceback (most recent call last):
[2023-05-17 18:36:28,087: WARNING/MainProcess]   File "/usr/local/bin/celery", line 8, in <module>
[2023-05-17 18:36:28,087: WARNING/MainProcess]
[2023-05-17 18:36:28,088: WARNING/MainProcess] sys.exit(main())
[2023-05-17 18:36:28,088: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/__main__.py", line 15, in main
[2023-05-17 18:36:28,088: WARNING/MainProcess]
[2023-05-17 18:36:28,088: WARNING/MainProcess] sys.exit(_main())
[2023-05-17 18:36:28,088: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bin/celery.py", line 217, in main
[2023-05-17 18:36:28,089: WARNING/MainProcess]
[2023-05-17 18:36:28,089: WARNING/MainProcess] return celery(auto_envvar_prefix="CELERY")
[2023-05-17 18:36:28,089: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1130, in __call__
[2023-05-17 18:36:28,089: WARNING/MainProcess]
[2023-05-17 18:36:28,090: WARNING/MainProcess] return self.main(*args, **kwargs)
[2023-05-17 18:36:28,090: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1055, in main
[2023-05-17 18:36:28,090: WARNING/MainProcess]
[2023-05-17 18:36:28,090: WARNING/MainProcess] rv = self.invoke(ctx)
[2023-05-17 18:36:28,091: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1657, in invoke
[2023-05-17 18:36:28,091: WARNING/MainProcess]
[2023-05-17 18:36:28,091: WARNING/MainProcess] return _process_result(sub_ctx.command.invoke(sub_ctx))
[2023-05-17 18:36:28,092: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 1404, in invoke
[2023-05-17 18:36:28,092: WARNING/MainProcess]
[2023-05-17 18:36:28,092: WARNING/MainProcess] return ctx.invoke(self.callback, **ctx.params)
[2023-05-17 18:36:28,092: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/click/core.py", line 760, in invoke
[2023-05-17 18:36:28,093: WARNING/MainProcess]
[2023-05-17 18:36:28,093: WARNING/MainProcess] return __callback(*args, **kwargs)
[2023-05-17 18:36:28,093: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/click/decorators.py", line 26, in new_func
[2023-05-17 18:36:28,093: WARNING/MainProcess]
[2023-05-17 18:36:28,094: WARNING/MainProcess] return f(get_current_context(), *args, **kwargs)
[2023-05-17 18:36:28,094: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bin/base.py", line 134, in caller
[2023-05-17 18:36:28,094: WARNING/MainProcess]
[2023-05-17 18:36:28,094: WARNING/MainProcess] return f(ctx, *args, **kwargs)
[2023-05-17 18:36:28,094: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bin/worker.py", line 351, in worker
[2023-05-17 18:36:28,095: WARNING/MainProcess]
[2023-05-17 18:36:28,095: WARNING/MainProcess] worker.start()
[2023-05-17 18:36:28,095: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 208, in start
[2023-05-17 18:36:28,095: WARNING/MainProcess]
[2023-05-17 18:36:28,095: WARNING/MainProcess] self.stop(exitcode=EX_FAILURE)
[2023-05-17 18:36:28,095: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 251, in stop
[2023-05-17 18:36:28,096: WARNING/MainProcess]
[2023-05-17 18:36:28,096: WARNING/MainProcess] self._shutdown(warm=True)
[2023-05-17 18:36:28,096: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 266, in _shutdown
[2023-05-17 18:36:28,096: WARNING/MainProcess]
[2023-05-17 18:36:28,096: WARNING/MainProcess] self.blueprint.stop(self, terminate=not warm)
[2023-05-17 18:36:28,096: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 174, in stop
[2023-05-17 18:36:28,097: WARNING/MainProcess]
[2023-05-17 18:36:28,097: WARNING/MainProcess] self.on_stopped()
[2023-05-17 18:36:28,097: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 162, in on_stopped
[2023-05-17 18:36:28,097: WARNING/MainProcess]
[2023-05-17 18:36:28,098: WARNING/MainProcess] self.consumer.shutdown()
[2023-05-17 18:36:28,098: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 376, in shutdown
[2023-05-17 18:36:28,098: WARNING/MainProcess]
[2023-05-17 18:36:28,098: WARNING/MainProcess] self.blueprint.shutdown(self)
[2023-05-17 18:36:28,098: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 174, in shutdown
[2023-05-17 18:36:28,098: WARNING/MainProcess]
[2023-05-17 18:36:28,098: WARNING/MainProcess] self.send_all(parent, 'shutdown')
[2023-05-17 18:36:28,098: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 148, in send_all
[2023-05-17 18:36:28,099: WARNING/MainProcess]
[2023-05-17 18:36:28,099: WARNING/MainProcess] fun(parent, *args)
[2023-05-17 18:36:28,099: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/connection.py", line 29, in shutdown
[2023-05-17 18:36:28,099: WARNING/MainProcess]
[2023-05-17 18:36:28,099: WARNING/MainProcess] ignore_errors(connection, connection.close)
[2023-05-17 18:36:28,099: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/common.py", line 316, in ignore_errors
[2023-05-17 18:36:28,100: WARNING/MainProcess]
[2023-05-17 18:36:28,100: WARNING/MainProcess] return fun(*args, **kwargs)
[2023-05-17 18:36:28,100: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/connection.py", line 373, in release
[2023-05-17 18:36:28,100: WARNING/MainProcess]
[2023-05-17 18:36:28,100: WARNING/MainProcess] self._close()
[2023-05-17 18:36:28,100: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/connection.py", line 339, in _close
[2023-05-17 18:36:28,100: WARNING/MainProcess]
[2023-05-17 18:36:28,101: WARNING/MainProcess] self._do_close_self()
[2023-05-17 18:36:28,101: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/connection.py", line 329, in _do_close_self
[2023-05-17 18:36:28,101: WARNING/MainProcess]
[2023-05-17 18:36:28,101: WARNING/MainProcess] self.maybe_close_channel(self._default_channel)
[2023-05-17 18:36:28,101: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/kombu/connection.py", line 321, in maybe_close_channel
[2023-05-17 18:36:28,101: WARNING/MainProcess]
[2023-05-17 18:36:28,101: WARNING/MainProcess] channel.close()
[2023-05-17 18:36:28,102: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 235, in close
[2023-05-17 18:36:28,102: WARNING/MainProcess]
[2023-05-17 18:36:28,102: WARNING/MainProcess] return self.send_method(
[2023-05-17 18:36:28,102: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 79, in send_method
[2023-05-17 18:36:28,102: WARNING/MainProcess]
[2023-05-17 18:36:28,102: WARNING/MainProcess] return self.wait(wait, returns_tuple=returns_tuple)
[2023-05-17 18:36:28,102: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 99, in wait
[2023-05-17 18:36:28,102: WARNING/MainProcess]
[2023-05-17 18:36:28,103: WARNING/MainProcess] self.connection.drain_events(timeout=timeout)
[2023-05-17 18:36:28,103: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
[2023-05-17 18:36:28,103: WARNING/MainProcess]
[2023-05-17 18:36:28,103: WARNING/MainProcess] while not self.blocking_read(timeout):
[2023-05-17 18:36:28,103: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
[2023-05-17 18:36:28,103: WARNING/MainProcess]
[2023-05-17 18:36:28,104: WARNING/MainProcess] return self.on_inbound_frame(frame)
[2023-05-17 18:36:28,104: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
[2023-05-17 18:36:28,104: WARNING/MainProcess]
[2023-05-17 18:36:28,104: WARNING/MainProcess] callback(channel, method_sig, buf, None)
[2023-05-17 18:36:28,104: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
[2023-05-17 18:36:28,104: WARNING/MainProcess]
[2023-05-17 18:36:28,104: WARNING/MainProcess] return self.channels[channel_id].dispatch_method(
[2023-05-17 18:36:28,105: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
[2023-05-17 18:36:28,105: WARNING/MainProcess]
[2023-05-17 18:36:28,105: WARNING/MainProcess] listener(*args)
[2023-05-17 18:36:28,105: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 292, in _on_close
[2023-05-17 18:36:28,105: WARNING/MainProcess]
[2023-05-17 18:36:28,105: WARNING/MainProcess] self._do_revive()
[2023-05-17 18:36:28,105: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 177, in _do_revive
[2023-05-17 18:36:28,106: WARNING/MainProcess]
[2023-05-17 18:36:28,106: WARNING/MainProcess] self.open()
[2023-05-17 18:36:28,106: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 448, in open
[2023-05-17 18:36:28,106: WARNING/MainProcess]
[2023-05-17 18:36:28,106: WARNING/MainProcess] return self.send_method(
[2023-05-17 18:36:28,106: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 79, in send_method
[2023-05-17 18:36:28,106: WARNING/MainProcess]
[2023-05-17 18:36:28,106: WARNING/MainProcess] return self.wait(wait, returns_tuple=returns_tuple)
[2023-05-17 18:36:28,107: WARNING/MainProcess]   File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 99, in wait
[2023-05-17 18:36:28,107: WARNING/MainProcess]
[2023-05-17 18:36:28,107: WARNING/MainProcess] self.connection.drain_events(timeout=timeout)
[2023-05-17 18:36:28,107: WARNING/MainProcess] AttributeError
[2023-05-17 18:36:28,107: WARNING/MainProcess] :
[2023-05-17 18:36:28,107: WARNING/MainProcess] 'NoneType' object has no attribute 'drain_events'
[2023-05-17 18:36:28,107: DEBUG/MainProcess] removing tasks from inqueue until task handler finished

Logs of the broker:

2023-05-17 18:35:46.973451+00:00 [info] <0.14894.8> connection <0.14894.8> (172.18.1.175:35584 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:35:46.979355+00:00 [info] <0.14911.8> accepting AMQP connection <0.14911.8> (172.18.1.175:35588 -> 172.18.1.170:5672)
2023-05-17 18:35:46.980928+00:00 [info] <0.14911.8> connection <0.14911.8> (172.18.1.175:35588 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:35:48.120293+00:00 [warning] <0.14911.8> closing AMQP connection <0.14911.8> (172.18.1.175:35588 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:35:48.120293+00:00 [warning] <0.14911.8> client unexpectedly closed TCP connection
2023-05-17 18:35:48.120590+00:00 [warning] <0.14894.8> closing AMQP connection <0.14894.8> (172.18.1.175:35584 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:35:48.120590+00:00 [warning] <0.14894.8> client unexpectedly closed TCP connection
2023-05-17 18:35:51.235891+00:00 [warning] <0.2043.8> Consumer 1 on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2023-05-17 18:35:51.236635+00:00 [error] <0.2043.8> Channel error on connection <0.2028.8> (172.18.1.175:39038 -> 172.18.1.170:5672, vhost: '/', user: 'user'), channel 1:
2023-05-17 18:35:51.236635+00:00 [error] <0.2043.8> operation none caused a channel exception precondition_failed: delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2023-05-17 18:35:54.902641+00:00 [info] <0.14960.8> accepting AMQP connection <0.14960.8> (172.18.1.175:48860 -> 172.18.1.170:5672)
2023-05-17 18:35:54.904113+00:00 [info] <0.14960.8> connection <0.14960.8> (172.18.1.175:48860 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:35:54.911310+00:00 [info] <0.14977.8> accepting AMQP connection <0.14977.8> (172.18.1.175:48876 -> 172.18.1.170:5672)
2023-05-17 18:35:54.913135+00:00 [info] <0.14977.8> connection <0.14977.8> (172.18.1.175:48876 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:35:56.098886+00:00 [warning] <0.14977.8> closing AMQP connection <0.14977.8> (172.18.1.175:48876 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:35:56.098886+00:00 [warning] <0.14977.8> client unexpectedly closed TCP connection
2023-05-17 18:35:56.098887+00:00 [warning] <0.14960.8> closing AMQP connection <0.14960.8> (172.18.1.175:48860 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:35:56.098887+00:00 [warning] <0.14960.8> client unexpectedly closed TCP connection
2023-05-17 18:36:02.838330+00:00 [info] <0.15018.8> accepting AMQP connection <0.15018.8> (172.18.1.175:53684 -> 172.18.1.170:5672)
2023-05-17 18:36:02.840914+00:00 [info] <0.15018.8> connection <0.15018.8> (172.18.1.175:53684 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:02.846895+00:00 [info] <0.15035.8> accepting AMQP connection <0.15035.8> (172.18.1.175:53696 -> 172.18.1.170:5672)
2023-05-17 18:36:02.848482+00:00 [info] <0.15035.8> connection <0.15035.8> (172.18.1.175:53696 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:04.026664+00:00 [warning] <0.15035.8> closing AMQP connection <0.15035.8> (172.18.1.175:53696 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:04.026664+00:00 [warning] <0.15035.8> client unexpectedly closed TCP connection
2023-05-17 18:36:04.027147+00:00 [warning] <0.15018.8> closing AMQP connection <0.15018.8> (172.18.1.175:53684 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:04.027147+00:00 [warning] <0.15018.8> client unexpectedly closed TCP connection
2023-05-17 18:36:10.836705+00:00 [info] <0.15076.8> accepting AMQP connection <0.15076.8> (172.18.1.175:53698 -> 172.18.1.170:5672)
2023-05-17 18:36:10.839514+00:00 [info] <0.15076.8> connection <0.15076.8> (172.18.1.175:53698 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:10.847727+00:00 [info] <0.15093.8> accepting AMQP connection <0.15093.8> (172.18.1.175:53710 -> 172.18.1.170:5672)
2023-05-17 18:36:10.849917+00:00 [info] <0.15093.8> connection <0.15093.8> (172.18.1.175:53710 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:11.988074+00:00 [warning] <0.15093.8> closing AMQP connection <0.15093.8> (172.18.1.175:53710 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:11.988074+00:00 [warning] <0.15093.8> client unexpectedly closed TCP connection
2023-05-17 18:36:11.988529+00:00 [warning] <0.15076.8> closing AMQP connection <0.15076.8> (172.18.1.175:53698 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:11.988529+00:00 [warning] <0.15076.8> client unexpectedly closed TCP connection
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>     supervisor: {<0.15080.8>,rabbit_channel_sup}
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>     errorContext: shutdown_error
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>     reason: noproc
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>     offender: [{pid,<0.15083.8>},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                {id,channel},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                {mfargs,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                    {rabbit_channel,start_link,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                        [1,<0.15076.8>,<0.15081.8>,<0.15076.8>,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                         <<"172.18.1.175:53698 -> 172.18.1.170:5672">>,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                         rabbit_framing_amqp_0_9_1,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                         {user,<<"user">>,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                             [administrator],
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                             [{rabbit_auth_backend_internal,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                                  #Fun<rabbit_auth_backend_internal.2.58838149>}]},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                         <<"/">>,
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                         [{<<"consumer_cancel_notify">>,bool,true},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                          {<<"connection.blocked">>,bool,true},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                          {<<"authentication_failure_close">>,bool,true}],
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                         <0.15077.8>,<0.15082.8>]}},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                {restart_type,intrinsic},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                {shutdown,70000},
2023-05-17 18:36:11.989349+00:00 [error] <0.15080.8>                {child_type,worker}]
2023-05-17 18:36:17.901192+00:00 [info] <0.15137.8> accepting AMQP connection <0.15137.8> (172.18.1.175:45580 -> 172.18.1.170:5672)
2023-05-17 18:36:17.903644+00:00 [info] <0.15137.8> connection <0.15137.8> (172.18.1.175:45580 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:17.909909+00:00 [info] <0.15154.8> accepting AMQP connection <0.15154.8> (172.18.1.175:45588 -> 172.18.1.170:5672)
2023-05-17 18:36:17.911845+00:00 [info] <0.15154.8> connection <0.15154.8> (172.18.1.175:45588 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:19.091268+00:00 [warning] <0.15154.8> closing AMQP connection <0.15154.8> (172.18.1.175:45588 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:19.091268+00:00 [warning] <0.15154.8> client unexpectedly closed TCP connection
2023-05-17 18:36:19.091579+00:00 [warning] <0.15137.8> closing AMQP connection <0.15137.8> (172.18.1.175:45580 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:19.091579+00:00 [warning] <0.15137.8> client unexpectedly closed TCP connection
2023-05-17 18:36:25.848534+00:00 [info] <0.15195.8> accepting AMQP connection <0.15195.8> (172.18.1.175:54336 -> 172.18.1.170:5672)
2023-05-17 18:36:25.850491+00:00 [info] <0.15195.8> connection <0.15195.8> (172.18.1.175:54336 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:25.855304+00:00 [info] <0.15212.8> accepting AMQP connection <0.15212.8> (172.18.1.175:54352 -> 172.18.1.170:5672)
2023-05-17 18:36:25.856229+00:00 [info] <0.15212.8> connection <0.15212.8> (172.18.1.175:54352 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:26.759106+00:00 [info] <0.15224.8> accepting AMQP connection <0.15224.8> (172.18.1.175:54366 -> 172.18.1.170:5672)
2023-05-17 18:36:26.761123+00:00 [info] <0.15224.8> connection <0.15224.8> (172.18.1.175:54366 -> 172.18.1.170:5672): user 'user' authenticated and granted access to vhost '/'
2023-05-17 18:36:27.032339+00:00 [warning] <0.15212.8> closing AMQP connection <0.15212.8> (172.18.1.175:54352 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:27.032339+00:00 [warning] <0.15212.8> client unexpectedly closed TCP connection
2023-05-17 18:36:27.032649+00:00 [warning] <0.15195.8> closing AMQP connection <0.15195.8> (172.18.1.175:54336 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:27.032649+00:00 [warning] <0.15195.8> client unexpectedly closed TCP connection
2023-05-17 18:36:28.071955+00:00 [warning] <0.15224.8> closing AMQP connection <0.15224.8> (172.18.1.175:54366 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:28.071955+00:00 [warning] <0.15224.8> client unexpectedly closed TCP connection
2023-05-17 18:36:28.072935+00:00 [error] <0.14940.8> Channel error on connection <0.2028.8> (172.18.1.175:39038 -> 172.18.1.170:5672, vhost: '/', user: 'user'), channel 1:
2023-05-17 18:36:28.072935+00:00 [error] <0.14940.8> operation basic.ack caused a channel exception precondition_failed: unknown delivery tag 1
2023-05-17 18:36:28.074976+00:00 [info] <0.2036.8> closing AMQP connection <0.2036.8> (172.18.1.175:39040 -> 172.18.1.170:5672, vhost: '/', user: 'user')
2023-05-17 18:36:28.812006+00:00 [warning] <0.2053.8> closing AMQP connection <0.2053.8> (172.18.1.175:39054 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:28.812006+00:00 [warning] <0.2053.8> client unexpectedly closed TCP connection
2023-05-17 18:36:28.812541+00:00 [warning] <0.2028.8> closing AMQP connection <0.2028.8> (172.18.1.175:39038 -> 172.18.1.170:5672, vhost: '/', user: 'user'):
2023-05-17 18:36:28.812541+00:00 [warning] <0.2028.8> client unexpectedly closed TCP connection

The complete logs of all the services after such a cash of the Celery service is available at /srv/registry/logs/failed_executions/05-17-2023_2 on Typhon.

yarikoptic commented 1 year ago

@candleindark says he never observed it again, let's let it RiP

candleindark commented 1 year ago

This problem is occurring again. Now, this problem tends to occur when the worker is relatively busy.

The following is the logs from the broker and worker at about the time the worker died.

broker logs:

2023-08-26 00:56:20.839903+00:00 [warning] <0.9962.0> closing AMQP connection <0.9962.0> (192.168.129.19:57868 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:20.839903+00:00 [warning] <0.9962.0> client unexpectedly closed TCP connection
2023-08-26 00:56:20.840286+00:00 [warning] <0.9944.0> closing AMQP connection <0.9944.0> (192.168.129.19:57856 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:20.840286+00:00 [warning] <0.9944.0> client unexpectedly closed TCP connection
2023-08-26 00:56:27.887789+00:00 [warning] <0.1059.0> Consumer 3004 on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2023-08-26 00:56:27.888507+00:00 [error] <0.1059.0> Channel error on connection <0.1044.0> (192.168.129.19:58054 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'), channel 1:
2023-08-26 00:56:27.888507+00:00 [error] <0.1059.0> operation none caused a channel exception precondition_failed: delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2023-08-26 00:56:32.964640+00:00 [warning] <0.1548.0> closing AMQP connection <0.1548.0> (192.168.129.19:36708 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:32.964640+00:00 [warning] <0.1548.0> client unexpectedly closed TCP connection
2023-08-26 00:56:46.587455+00:00 [warning] <0.1284.0> closing AMQP connection <0.1284.0> (192.168.129.19:45552 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:46.587455+00:00 [warning] <0.1284.0> client unexpectedly closed TCP connection
2023-08-26 00:56:54.280439+00:00 [info] <0.10007.0> accepting AMQP connection <0.10007.0> (192.168.129.19:57124 -> 192.168.129.15:5672)
2023-08-26 00:56:54.283512+00:00 [info] <0.10007.0> connection <0.10007.0> (192.168.129.19:57124 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:56:54.292429+00:00 [info] <0.10025.0> accepting AMQP connection <0.10025.0> (192.168.129.19:57140 -> 192.168.129.15:5672)
2023-08-26 00:56:54.294844+00:00 [info] <0.10025.0> connection <0.10025.0> (192.168.129.19:57140 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:56:55.607485+00:00 [warning] <0.10025.0> closing AMQP connection <0.10025.0> (192.168.129.19:57140 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:55.607485+00:00 [warning] <0.10025.0> client unexpectedly closed TCP connection
2023-08-26 00:56:55.608471+00:00 [warning] <0.10007.0> closing AMQP connection <0.10007.0> (192.168.129.19:57124 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:55.608471+00:00 [warning] <0.10007.0> client unexpectedly closed TCP connection
2023-08-26 00:57:05.119475+00:00 [warning] <0.1638.0> closing AMQP connection <0.1638.0> (192.168.129.19:59660 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:05.119475+00:00 [warning] <0.1638.0> client unexpectedly closed TCP connection
2023-08-26 00:57:11.136592+00:00 [warning] <0.1500.0> closing AMQP connection <0.1500.0> (192.168.129.19:45700 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:11.136592+00:00 [warning] <0.1500.0> client unexpectedly closed TCP connection
2023-08-26 00:57:11.617903+00:00 [warning] <0.1452.0> closing AMQP connection <0.1452.0> (192.168.129.19:45656 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:11.617903+00:00 [warning] <0.1452.0> client unexpectedly closed TCP connection
2023-08-26 00:57:17.717436+00:00 [warning] <0.1693.0> closing AMQP connection <0.1693.0> (192.168.129.19:33858 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:17.717436+00:00 [warning] <0.1693.0> client unexpectedly closed TCP connection
2023-08-26 00:57:21.216715+00:00 [warning] <0.1272.0> closing AMQP connection <0.1272.0> (192.168.129.19:45536 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:21.216715+00:00 [warning] <0.1272.0> client unexpectedly closed TCP connection
2023-08-26 00:57:22.669984+00:00 [warning] <0.1512.0> closing AMQP connection <0.1512.0> (192.168.129.19:45704 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:22.669984+00:00 [warning] <0.1512.0> client unexpectedly closed TCP connection
2023-08-26 00:57:22.700691+00:00 [warning] <0.1404.0> closing AMQP connection <0.1404.0> (192.168.129.19:45630 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:22.700691+00:00 [warning] <0.1404.0> client unexpectedly closed TCP connection
2023-08-26 00:57:27.886961+00:00 [info] <0.10083.0> accepting AMQP connection <0.10083.0> (192.168.129.19:58278 -> 192.168.129.15:5672)
2023-08-26 00:57:27.888942+00:00 [info] <0.10083.0> connection <0.10083.0> (192.168.129.19:58278 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:57:27.895117+00:00 [info] <0.10101.0> accepting AMQP connection <0.10101.0> (192.168.129.19:58282 -> 192.168.129.15:5672)
2023-08-26 00:57:27.896614+00:00 [info] <0.10101.0> connection <0.10101.0> (192.168.129.19:58282 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:57:29.049247+00:00 [warning] <0.10101.0> closing AMQP connection <0.10101.0> (192.168.129.19:58282 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:29.049247+00:00 [warning] <0.10101.0> client unexpectedly closed TCP connection
2023-08-26 00:57:29.049591+00:00 [warning] <0.10083.0> closing AMQP connection <0.10083.0> (192.168.129.19:58278 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:29.049591+00:00 [warning] <0.10083.0> client unexpectedly closed TCP connection
2023-08-26 00:57:29.211600+00:00 [warning] <0.1624.0> closing AMQP connection <0.1624.0> (192.168.129.19:54278 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:29.211600+00:00 [warning] <0.1624.0> client unexpectedly closed TCP connection
2023-08-26 00:57:40.849034+00:00 [warning] <0.1488.0> closing AMQP connection <0.1488.0> (192.168.129.19:45684 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:40.849034+00:00 [warning] <0.1488.0> client unexpectedly closed TCP connection
2023-08-26 00:57:58.349654+00:00 [warning] <0.1611.0> closing AMQP connection <0.1611.0> (192.168.129.19:54264 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:58.349654+00:00 [warning] <0.1611.0> client unexpectedly closed TCP connection
2023-08-26 00:58:01.793965+00:00 [info] <0.10141.0> accepting AMQP connection <0.10141.0> (192.168.129.19:42406 -> 192.168.129.15:5672)
2023-08-26 00:58:01.795738+00:00 [info] <0.10141.0> connection <0.10141.0> (192.168.129.19:42406 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:01.801577+00:00 [info] <0.10159.0> accepting AMQP connection <0.10159.0> (192.168.129.19:42416 -> 192.168.129.15:5672)
2023-08-26 00:58:01.802943+00:00 [info] <0.10159.0> connection <0.10159.0> (192.168.129.19:42416 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:03.031900+00:00 [warning] <0.10159.0> closing AMQP connection <0.10159.0> (192.168.129.19:42416 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:03.031900+00:00 [warning] <0.10159.0> client unexpectedly closed TCP connection
2023-08-26 00:58:03.032255+00:00 [warning] <0.10141.0> closing AMQP connection <0.10141.0> (192.168.129.19:42406 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:03.032255+00:00 [warning] <0.10141.0> client unexpectedly closed TCP connection
2023-08-26 00:58:14.364271+00:00 [warning] <0.1368.0> closing AMQP connection <0.1368.0> (192.168.129.19:45614 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:14.364271+00:00 [warning] <0.1368.0> client unexpectedly closed TCP connection
2023-08-26 00:58:26.951186+00:00 [warning] <0.1248.0> closing AMQP connection <0.1248.0> (192.168.129.19:45512 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:26.951186+00:00 [warning] <0.1248.0> client unexpectedly closed TCP connection
2023-08-26 00:58:28.905313+00:00 [warning] <0.1392.0> closing AMQP connection <0.1392.0> (192.168.129.19:45628 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:28.905313+00:00 [warning] <0.1392.0> client unexpectedly closed TCP connection
2023-08-26 00:58:35.542318+00:00 [info] <0.10201.0> accepting AMQP connection <0.10201.0> (192.168.129.19:41420 -> 192.168.129.15:5672)
2023-08-26 00:58:35.543753+00:00 [info] <0.10201.0> connection <0.10201.0> (192.168.129.19:41420 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:35.548244+00:00 [info] <0.10219.0> accepting AMQP connection <0.10219.0> (192.168.129.19:41424 -> 192.168.129.15:5672)
2023-08-26 00:58:35.549251+00:00 [info] <0.10219.0> connection <0.10219.0> (192.168.129.19:41424 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:36.712489+00:00 [warning] <0.10219.0> closing AMQP connection <0.10219.0> (192.168.129.19:41424 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:36.712489+00:00 [warning] <0.10219.0> client unexpectedly closed TCP connection
2023-08-26 00:58:36.712761+00:00 [warning] <0.10201.0> closing AMQP connection <0.10201.0> (192.168.129.19:41420 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:36.712761+00:00 [warning] <0.10201.0> client unexpectedly closed TCP connection
2023-08-26 00:58:39.437658+00:00 [warning] <0.1536.0> closing AMQP connection <0.1536.0> (192.168.129.19:45736 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:39.437658+00:00 [warning] <0.1536.0> client unexpectedly closed TCP connection
2023-08-26 00:58:40.999813+00:00 [warning] <0.1416.0> closing AMQP connection <0.1416.0> (192.168.129.19:45638 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:40.999813+00:00 [warning] <0.1416.0> client unexpectedly closed TCP connection
2023-08-26 00:58:56.519831+00:00 [warning] <0.1296.0> closing AMQP connection <0.1296.0> (192.168.129.19:45560 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:56.519831+00:00 [warning] <0.1296.0> client unexpectedly closed TCP connection
2023-08-26 00:58:56.973822+00:00 [error] <0.1044.0> closing AMQP connection <0.1044.0> (192.168.129.19:58054 -> 192.168.129.15:5672):
2023-08-26 00:58:56.973822+00:00 [error] <0.1044.0> missed heartbeats from client, timeout: 60s
2023-08-26 00:59:00.084892+00:00 [warning] <0.3738.0> closing AMQP connection <0.3738.0> (192.168.129.19:56970 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:00.084892+00:00 [warning] <0.3738.0> client unexpectedly closed TCP connection
2023-08-26 00:59:08.435955+00:00 [info] <0.10262.0> accepting AMQP connection <0.10262.0> (192.168.129.19:38396 -> 192.168.129.15:5672)
2023-08-26 00:59:08.437242+00:00 [info] <0.10262.0> connection <0.10262.0> (192.168.129.19:38396 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:59:08.441407+00:00 [info] <0.10280.0> accepting AMQP connection <0.10280.0> (192.168.129.19:38412 -> 192.168.129.15:5672)
2023-08-26 00:59:08.442329+00:00 [info] <0.10280.0> connection <0.10280.0> (192.168.129.19:38412 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:59:09.643408+00:00 [warning] <0.10280.0> closing AMQP connection <0.10280.0> (192.168.129.19:38412 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:09.643408+00:00 [warning] <0.10280.0> client unexpectedly closed TCP connection
2023-08-26 00:59:09.643629+00:00 [warning] <0.10262.0> closing AMQP connection <0.10262.0> (192.168.129.19:38396 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:09.643629+00:00 [warning] <0.10262.0> client unexpectedly closed TCP connection
2023-08-26 00:59:09.644291+00:00 [warning] <0.1356.0> closing AMQP connection <0.1356.0> (192.168.129.19:45596 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:09.644291+00:00 [warning] <0.1356.0> client unexpectedly closed TCP connection
2023-08-26 00:59:10.100418+00:00 [warning] <0.1590.0> closing AMQP connection <0.1590.0> (192.168.129.19:36738 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:10.100418+00:00 [warning] <0.1590.0> client unexpectedly closed TCP connection
2023-08-26 00:59:23.214045+00:00 [warning] <0.2992.0> closing AMQP connection <0.2992.0> (192.168.129.19:54722 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:23.214045+00:00 [warning] <0.2992.0> client unexpectedly closed TCP connection
2023-08-26 00:59:27.715245+00:00 [warning] <0.1371.0> closing AMQP connection <0.1371.0> (192.168.129.19:45606 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:27.715245+00:00 [warning] <0.1371.0> client unexpectedly closed TCP connection
2023-08-26 00:59:30.737415+00:00 [warning] <0.1344.0> closing AMQP connection <0.1344.0> (192.168.129.19:45586 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:30.737415+00:00 [warning] <0.1344.0> client unexpectedly closed TCP connection
2023-08-26 00:59:34.242138+00:00 [warning] <0.1440.0> closing AMQP connection <0.1440.0> (192.168.129.19:45652 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:34.242138+00:00 [warning] <0.1440.0> client unexpectedly closed TCP connection
2023-08-26 00:59:37.772887+00:00 [warning] <0.1320.0> closing AMQP connection <0.1320.0> (192.168.129.19:45568 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:37.772887+00:00 [warning] <0.1320.0> client unexpectedly closed TCP connection

worker log:

[2023-08-26 00:56:14,762: INFO/ForkPoolWorker-6] Resolving deltas:  99% (45243/45699)
[2023-08-26 00:56:14,799: INFO/ForkPoolWorker-6] Resolving deltas: 100% (45699/45699)
[2023-08-26 00:56:14,800: INFO/ForkPoolWorker-6] Resolving deltas: 100% (45699/45699), done.
[2023-08-26 00:56:14,800: INFO/ForkPoolWorker-6] Finished resolving deltas
[INFO] Completed clone attempts for Dataset(/data/cache/43a/4ca/db0b3c451fa973352af819f930)
[2023-08-26 00:56:18,055: INFO/ForkPoolWorker-35] Completed clone attempts for Dataset(/data/cache/43a/4ca/db0b3c451fa973352af819f930)
[INFO] Completed clone attempts for Dataset(/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee)
[2023-08-26 00:56:18,383: INFO/ForkPoolWorker-6] Completed clone attempts for Dataset(/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee)
[INFO] scanning for unlocked files (this may take some time)
[2023-08-26 00:56:20,115: INFO/ForkPoolWorker-6] scanning for unlocked files (this may take some time)
[INFO] scanning for unlocked files (this may take some time)
[2023-08-26 00:56:20,699: INFO/ForkPoolWorker-35] scanning for unlocked files (this may take some time)
[INFO] Remote origin not usable by git-annex; setting annex-ignore
[2023-08-26 00:56:20,716: INFO/ForkPoolWorker-6] Remote origin not usable by git-annex; setting annex-ignore
[INFO] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
|       datalad siblings -d "/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee" enable -s SIBLING
[2023-08-26 00:56:21,184: INFO/ForkPoolWorker-6] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
|       datalad siblings -d "/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee" enable -s SIBLING
[INFO] Remote origin not usable by git-annex; setting annex-ignore
[2023-08-26 00:56:21,322: INFO/ForkPoolWorker-35] Remote origin not usable by git-annex; setting annex-ignore
[INFO] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
|       datalad siblings -d "/data/cache/43a/4ca/db0b3c451fa973352af819f930" enable -s SIBLING
[2023-08-26 00:56:22,006: INFO/ForkPoolWorker-35] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
|       datalad siblings -d "/data/cache/43a/4ca/db0b3c451fa973352af819f930" enable -s SIBLING
[2023-08-26 00:56:27,890: CRITICAL/MainProcess] Unrecoverable error: PreconditionFailed(406, 'PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more', (0, 0), '')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
    reader(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
    drain_events(timeout=0)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
    raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
[2023-08-26 00:56:28,824: INFO/ForkPoolWorker-8] Task datalad_registry.tasks.process_dataset_url[e9962268-8bc2-4a1f-9504-60dfa8899df8] succeeded in 386.26602735184133s: None
[2023-08-26 00:56:30,721: INFO/ForkPoolWorker-4] Task datalad_registry.tasks.process_dataset_url[4efc703d-7a25-42ab-b517-2c5ff2ac3813] succeeded in 110.96896563656628s: None
[2023-08-26 00:56:56,276: INFO/ForkPoolWorker-26] Task datalad_registry.tasks.process_dataset_url[58415623-b2e0-4368-8b8e-3c0ea1bcf579] succeeded in 224.67544884234667s: None
[2023-08-26 00:56:59,487: INFO/ForkPoolWorker-13] Task datalad_registry.tasks.process_dataset_url[346749bd-cb3a-4695-b840-d32f488aeb8c] succeeded in 397.3926819656044s: None
[2023-08-26 00:57:06,593: INFO/ForkPoolWorker-28] Task datalad_registry.tasks.process_dataset_url[b1607ca4-64cf-4aae-93d1-9f3c1e21d4d7] succeeded in 295.41920122131705s: None
[2023-08-26 00:57:11,433: INFO/ForkPoolWorker-25] Task datalad_registry.tasks.process_dataset_url[7cec74af-092c-405c-bf1e-8fce7ad0c47a] succeeded in 264.1157172396779s: None
[2023-08-26 00:57:12,185: INFO/ForkPoolWorker-20] Task datalad_registry.tasks.process_dataset_url[4f6f974a-3894-4995-b2fc-56c6f17e0485] succeeded in 342.43002944998443s: None
[2023-08-26 00:57:15,168: INFO/ForkPoolWorker-11] Task datalad_registry.tasks.process_dataset_url[c6362f33-e32e-4254-93a7-49a6a242b6c9] succeeded in 196.7069190684706s: None
[2023-08-26 00:57:15,869: INFO/ForkPoolWorker-24] Task datalad_registry.tasks.process_dataset_url[ddb6b72a-b7bf-4086-ae10-df34bdadabef] succeeded in 419.4049945361912s: None
[2023-08-26 00:57:20,677: INFO/ForkPoolWorker-12] Task datalad_registry.tasks.process_dataset_url[f4c83fa8-8bb1-42ba-aad4-22cd696cff65] succeeded in 230.99079968221486s: None
[2023-08-26 00:57:29,181: INFO/ForkPoolWorker-15] Task datalad_registry.tasks.process_dataset_url[8982cfba-2b97-453b-bd96-34efbe6b4f18] succeeded in 295.66399951837957s: None
[2023-08-26 00:57:47,712: INFO/ForkPoolWorker-3] Task datalad_registry.tasks.process_dataset_url[e7ddfabb-c407-46db-a11e-5d050fca8810] succeeded in 365.551789149642s: None
[2023-08-26 00:58:04,235: INFO/ForkPoolWorker-23] Task datalad_registry.tasks.process_dataset_url[2b875c1d-d0ae-417d-b875-5e1b890873f5] succeeded in 247.2393686696887s: None
[2023-08-26 00:58:22,532: INFO/ForkPoolWorker-2] Task datalad_registry.tasks.process_dataset_url[fb2a2d74-6759-4b4b-9500-9790d843ee23] succeeded in 341.76253108121455s: None
[2023-08-26 00:58:27,671: INFO/ForkPoolWorker-10] Task datalad_registry.tasks.process_dataset_url[6e0ba849-8c7c-41d5-9ea6-39bed8e17475] succeeded in 226.7042004056275s: None
[2023-08-26 00:58:31,602: INFO/ForkPoolWorker-19] Task datalad_registry.tasks.process_dataset_url[1cccddda-6366-4196-ab15-46ec32a53c3f] succeeded in 321.799700550735s: None
[2023-08-26 00:58:32,263: INFO/ForkPoolWorker-29] Task datalad_registry.tasks.process_dataset_url[af3ea88c-d420-4522-82d7-ac14e82c1d31] succeeded in 193.45017509534955s: None
[2023-08-26 00:58:52,898: INFO/ForkPoolWorker-31] Task datalad_registry.tasks.process_dataset_url[b23c91f1-906f-4d49-aa3e-b1d8f2b316eb] succeeded in 198.251332083717s: None
[2023-08-26 00:58:57,360: INFO/ForkPoolWorker-34] Task datalad_registry.tasks.process_dataset_url[180fe4f7-1603-49fe-a023-df8056595421] succeeded in 333.59812715649605s: None
[2023-08-26 00:59:05,521: INFO/ForkPoolWorker-32] Task datalad_registry.tasks.process_dataset_url[02e00f7f-2856-4fe6-a9ed-6f56d9d66161] succeeded in 392.8192468546331s: None
[2023-08-26 00:59:08,981: INFO/ForkPoolWorker-5] Task datalad_registry.tasks.process_dataset_url[85d75533-1f68-444b-b44c-6bf7b7b9cf50] succeeded in 301.2957050167024s: None
[2023-08-26 00:59:16,971: INFO/ForkPoolWorker-33] Task datalad_registry.tasks.process_dataset_url[332d222a-61bf-478a-945f-3b465f15848b] succeeded in 268.5010988805443s: None
[2023-08-26 00:59:25,704: INFO/ForkPoolWorker-6] Task datalad_registry.tasks.process_dataset_url[28a679cf-bd92-4397-bbeb-910d9ec608e6] succeeded in 193.8229241631925s: None
[2023-08-26 00:59:25,883: INFO/ForkPoolWorker-14] Task datalad_registry.tasks.process_dataset_url[cd3aba57-9847-456c-9f6a-003bc2e7f1e9] succeeded in 259.70399592258036s: None
[2023-08-26 00:59:31,700: INFO/ForkPoolWorker-1] Task datalad_registry.tasks.process_dataset_url[ebce3850-2697-472c-876f-8a94cf8348b1] succeeded in 386.87066436745226s: None
[2023-08-26 00:59:34,339: INFO/ForkPoolWorker-7] Task datalad_registry.tasks.process_dataset_url[8917cf33-da8e-4270-95a2-aa1b3d49a5f9] succeeded in 223.30276307836175s: None
[INFO] Finished bids_dataset metadata extraction from /data/cache/f95/03d/d4a03d4e109b9a3a9c15bcd82d
[2023-08-26 01:00:02,184: INFO/ForkPoolWorker-18] Finished bids_dataset metadata extraction from /data/cache/f95/03d/d4a03d4e109b9a3a9c15bcd82d
[2023-08-26 01:00:02,191: INFO/ForkPoolWorker-18] Task datalad_registry.tasks.extract_ds_meta[d23f7901-2d38-419a-a4bd-78446dfcd26e] succeeded in 1437.0802579522133s: 'SUCCEEDED'
[2023-08-26 01:00:21,820: INFO/ForkPoolWorker-22] Task datalad_registry.tasks.process_dataset_url[6fb490af-436e-4b8c-8ed4-51954d03f13a] succeeded in 345.64947314560413s: None
[2023-08-26 01:00:28,264: INFO/ForkPoolWorker-9] Task datalad_registry.tasks.process_dataset_url[2524840f-ecc4-4f0a-948e-15de94befc12] succeeded in 360.8268299549818s: None
[2023-08-26 01:00:39,942: INFO/ForkPoolWorker-21] Task datalad_registry.tasks.process_dataset_url[fe791219-98d1-4730-99c5-6abd00ee9f42] succeeded in 325.8302733749151s: None
[2023-08-26 01:00:49,682: INFO/ForkPoolWorker-35] Task datalad_registry.tasks.process_dataset_url[35fc3e56-5fd5-45ba-9d3d-52b3024ea9fd] succeeded in 280.61846880428493s: None
[2023-08-26 01:01:17,938: INFO/ForkPoolWorker-30] Task datalad_registry.tasks.process_dataset_url[cc19c016-60b1-4a4b-abde-458f10270885] succeeded in 324.5811691340059s: None
[2023-08-26 01:01:19,067: CRITICAL/MainProcess] Couldn't ack 3217, reason:ConnectionResetError(104, 'Connection reset by peer')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
    reader(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
    drain_events(timeout=0)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
    raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 131, in ack_log_error
    self.ack(multiple=multiple)
  File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 126, in ack
    self.channel.basic_ack(self.delivery_tag, multiple=multiple)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 1407, in basic_ack
    return self.send_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 70, in send_method
    conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 186, in write_frame
    write(buffer_store.view[:offset])
  File "/usr/local/lib/python3.9/dist-packages/amqp/transport.py", line 347, in write
    self._write(s)
ConnectionResetError: [Errno 104] Connection reset by peer
[2023-08-26 01:01:19,068: CRITICAL/MainProcess] Couldn't ack 3216, reason:BrokenPipeError(32, 'Broken pipe')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
    reader(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
    drain_events(timeout=0)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
    raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 131, in ack_log_error
    self.ack(multiple=multiple)
  File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 126, in ack
    self.channel.basic_ack(self.delivery_tag, multiple=multiple)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 1407, in basic_ack
    return self.send_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 70, in send_method
    conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 186, in write_frame
    write(buffer_store.view[:offset])
  File "/usr/local/lib/python3.9/dist-packages/amqp/transport.py", line 347, in write
    self._write(s)
BrokenPipeError: [Errno 32] Broken pipe
[2023-08-26 01:01:19,068: ERROR/MainProcess] Error on stopping Hub: BrokenPipeError(32, 'Broken pipe')
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
    self.blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
    return self.obj.start()
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
    blueprint.start(self)
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
    step.start(parent)
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
    c.loop(*c.loop_args())
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
    next(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
    cb(*cbargs)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
    reader(loop)
  File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
    drain_events(timeout=0)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
    while not self.blocking_read(timeout):
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
    return self.on_inbound_frame(frame)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
    callback(channel, method_sig, buf, None)
  File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
    return self.channels[channel_id].dispatch_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
    listener(*args)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
    raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 148, in send_all
    fun(parent, *args)
  File "/usr/local/lib/python3.9/dist-packages/celery/worker/components.py", line 81, in stop
    w.hub.close()
  File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 274, in close
    item()
  File "/usr/local/lib/python3.9/dist-packages/vine/promises.py", line 160, in __call__
    return self.throw()
  File "/usr/local/lib/python3.9/dist-packages/vine/promises.py", line 157, in __call__
    retval = fun(*final_args, **final_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 131, in ack_log_error
    self.ack(multiple=multiple)
  File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 126, in ack
    self.channel.basic_ack(self.delivery_tag, multiple=multiple)
  File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 1407, in basic_ack
    return self.send_method(
  File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 70, in send_method
    conn.frame_writer(1, self.channel_id, sig, args, content)
  File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 186, in write_frame
    write(buffer_store.view[:offset])
  File "/usr/local/lib/python3.9/dist-packages/amqp/transport.py", line 347, in write
    self._write(s)
BrokenPipeError: [Errno 32] Broken pipe
yarikoptic commented 1 year ago

Did you Google it?

candleindark commented 1 year ago

Just briefly. Haven't taken a deep look yet.

  1. https://stackoverflow.com/questions/53374144/rabbitmq-ack-timeout
  2. https://stackoverflow.com/questions/69828547/precondition-failed-delivery-acknowledge-timeout-on-celery-rabbitmq-with-geve
candleindark commented 1 year ago

After following the lead on https://stackoverflow.com/questions/69828547/precondition-failed-delivery-acknowledge-timeout-on-celery-rabbitmq-with-geve, I think I have found potential solution: increasing consumer_timeout of the broker to 12 hours.

I have restarted an instance of DataLad-Registry on Typhon and restarted its population. If the worker doesn't crash after 12 hours, we should be good. Otherwise, there may be some tasks getting stuck.

For more information regarding this potential solution, please visit:

  1. https://docs.celeryq.dev/en/stable/userguide/calling.html#eta-and-countdown
  2. https://www.rabbitmq.com/consumers.html#acknowledgement-timeout
candleindark commented 1 year ago

With PR #232, the worker been up for more than 12 hours (worker termination error used to occur about 1 hour after the start of the worker). Let's consider this issue resolved.

Note: Due to the limitation of disk space, the DataLad-Registry instance on Typhon couldn't be populated fully for the test of solution. It was only populated with 1500 datasets.