Closed candleindark closed 1 year ago
@candleindark says he never observed it again, let's let it RiP
This problem is occurring again. Now, this problem tends to occur when the worker is relatively busy.
The following is the logs from the broker and worker at about the time the worker died.
broker logs:
2023-08-26 00:56:20.839903+00:00 [warning] <0.9962.0> closing AMQP connection <0.9962.0> (192.168.129.19:57868 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:20.839903+00:00 [warning] <0.9962.0> client unexpectedly closed TCP connection
2023-08-26 00:56:20.840286+00:00 [warning] <0.9944.0> closing AMQP connection <0.9944.0> (192.168.129.19:57856 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:20.840286+00:00 [warning] <0.9944.0> client unexpectedly closed TCP connection
2023-08-26 00:56:27.887789+00:00 [warning] <0.1059.0> Consumer 3004 on channel 1 has timed out waiting for delivery acknowledgement. Timeout used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2023-08-26 00:56:27.888507+00:00 [error] <0.1059.0> Channel error on connection <0.1044.0> (192.168.129.19:58054 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'), channel 1:
2023-08-26 00:56:27.888507+00:00 [error] <0.1059.0> operation none caused a channel exception precondition_failed: delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
2023-08-26 00:56:32.964640+00:00 [warning] <0.1548.0> closing AMQP connection <0.1548.0> (192.168.129.19:36708 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:32.964640+00:00 [warning] <0.1548.0> client unexpectedly closed TCP connection
2023-08-26 00:56:46.587455+00:00 [warning] <0.1284.0> closing AMQP connection <0.1284.0> (192.168.129.19:45552 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:46.587455+00:00 [warning] <0.1284.0> client unexpectedly closed TCP connection
2023-08-26 00:56:54.280439+00:00 [info] <0.10007.0> accepting AMQP connection <0.10007.0> (192.168.129.19:57124 -> 192.168.129.15:5672)
2023-08-26 00:56:54.283512+00:00 [info] <0.10007.0> connection <0.10007.0> (192.168.129.19:57124 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:56:54.292429+00:00 [info] <0.10025.0> accepting AMQP connection <0.10025.0> (192.168.129.19:57140 -> 192.168.129.15:5672)
2023-08-26 00:56:54.294844+00:00 [info] <0.10025.0> connection <0.10025.0> (192.168.129.19:57140 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:56:55.607485+00:00 [warning] <0.10025.0> closing AMQP connection <0.10025.0> (192.168.129.19:57140 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:55.607485+00:00 [warning] <0.10025.0> client unexpectedly closed TCP connection
2023-08-26 00:56:55.608471+00:00 [warning] <0.10007.0> closing AMQP connection <0.10007.0> (192.168.129.19:57124 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:56:55.608471+00:00 [warning] <0.10007.0> client unexpectedly closed TCP connection
2023-08-26 00:57:05.119475+00:00 [warning] <0.1638.0> closing AMQP connection <0.1638.0> (192.168.129.19:59660 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:05.119475+00:00 [warning] <0.1638.0> client unexpectedly closed TCP connection
2023-08-26 00:57:11.136592+00:00 [warning] <0.1500.0> closing AMQP connection <0.1500.0> (192.168.129.19:45700 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:11.136592+00:00 [warning] <0.1500.0> client unexpectedly closed TCP connection
2023-08-26 00:57:11.617903+00:00 [warning] <0.1452.0> closing AMQP connection <0.1452.0> (192.168.129.19:45656 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:11.617903+00:00 [warning] <0.1452.0> client unexpectedly closed TCP connection
2023-08-26 00:57:17.717436+00:00 [warning] <0.1693.0> closing AMQP connection <0.1693.0> (192.168.129.19:33858 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:17.717436+00:00 [warning] <0.1693.0> client unexpectedly closed TCP connection
2023-08-26 00:57:21.216715+00:00 [warning] <0.1272.0> closing AMQP connection <0.1272.0> (192.168.129.19:45536 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:21.216715+00:00 [warning] <0.1272.0> client unexpectedly closed TCP connection
2023-08-26 00:57:22.669984+00:00 [warning] <0.1512.0> closing AMQP connection <0.1512.0> (192.168.129.19:45704 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:22.669984+00:00 [warning] <0.1512.0> client unexpectedly closed TCP connection
2023-08-26 00:57:22.700691+00:00 [warning] <0.1404.0> closing AMQP connection <0.1404.0> (192.168.129.19:45630 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:22.700691+00:00 [warning] <0.1404.0> client unexpectedly closed TCP connection
2023-08-26 00:57:27.886961+00:00 [info] <0.10083.0> accepting AMQP connection <0.10083.0> (192.168.129.19:58278 -> 192.168.129.15:5672)
2023-08-26 00:57:27.888942+00:00 [info] <0.10083.0> connection <0.10083.0> (192.168.129.19:58278 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:57:27.895117+00:00 [info] <0.10101.0> accepting AMQP connection <0.10101.0> (192.168.129.19:58282 -> 192.168.129.15:5672)
2023-08-26 00:57:27.896614+00:00 [info] <0.10101.0> connection <0.10101.0> (192.168.129.19:58282 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:57:29.049247+00:00 [warning] <0.10101.0> closing AMQP connection <0.10101.0> (192.168.129.19:58282 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:29.049247+00:00 [warning] <0.10101.0> client unexpectedly closed TCP connection
2023-08-26 00:57:29.049591+00:00 [warning] <0.10083.0> closing AMQP connection <0.10083.0> (192.168.129.19:58278 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:29.049591+00:00 [warning] <0.10083.0> client unexpectedly closed TCP connection
2023-08-26 00:57:29.211600+00:00 [warning] <0.1624.0> closing AMQP connection <0.1624.0> (192.168.129.19:54278 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:29.211600+00:00 [warning] <0.1624.0> client unexpectedly closed TCP connection
2023-08-26 00:57:40.849034+00:00 [warning] <0.1488.0> closing AMQP connection <0.1488.0> (192.168.129.19:45684 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:40.849034+00:00 [warning] <0.1488.0> client unexpectedly closed TCP connection
2023-08-26 00:57:58.349654+00:00 [warning] <0.1611.0> closing AMQP connection <0.1611.0> (192.168.129.19:54264 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:57:58.349654+00:00 [warning] <0.1611.0> client unexpectedly closed TCP connection
2023-08-26 00:58:01.793965+00:00 [info] <0.10141.0> accepting AMQP connection <0.10141.0> (192.168.129.19:42406 -> 192.168.129.15:5672)
2023-08-26 00:58:01.795738+00:00 [info] <0.10141.0> connection <0.10141.0> (192.168.129.19:42406 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:01.801577+00:00 [info] <0.10159.0> accepting AMQP connection <0.10159.0> (192.168.129.19:42416 -> 192.168.129.15:5672)
2023-08-26 00:58:01.802943+00:00 [info] <0.10159.0> connection <0.10159.0> (192.168.129.19:42416 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:03.031900+00:00 [warning] <0.10159.0> closing AMQP connection <0.10159.0> (192.168.129.19:42416 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:03.031900+00:00 [warning] <0.10159.0> client unexpectedly closed TCP connection
2023-08-26 00:58:03.032255+00:00 [warning] <0.10141.0> closing AMQP connection <0.10141.0> (192.168.129.19:42406 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:03.032255+00:00 [warning] <0.10141.0> client unexpectedly closed TCP connection
2023-08-26 00:58:14.364271+00:00 [warning] <0.1368.0> closing AMQP connection <0.1368.0> (192.168.129.19:45614 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:14.364271+00:00 [warning] <0.1368.0> client unexpectedly closed TCP connection
2023-08-26 00:58:26.951186+00:00 [warning] <0.1248.0> closing AMQP connection <0.1248.0> (192.168.129.19:45512 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:26.951186+00:00 [warning] <0.1248.0> client unexpectedly closed TCP connection
2023-08-26 00:58:28.905313+00:00 [warning] <0.1392.0> closing AMQP connection <0.1392.0> (192.168.129.19:45628 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:28.905313+00:00 [warning] <0.1392.0> client unexpectedly closed TCP connection
2023-08-26 00:58:35.542318+00:00 [info] <0.10201.0> accepting AMQP connection <0.10201.0> (192.168.129.19:41420 -> 192.168.129.15:5672)
2023-08-26 00:58:35.543753+00:00 [info] <0.10201.0> connection <0.10201.0> (192.168.129.19:41420 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:35.548244+00:00 [info] <0.10219.0> accepting AMQP connection <0.10219.0> (192.168.129.19:41424 -> 192.168.129.15:5672)
2023-08-26 00:58:35.549251+00:00 [info] <0.10219.0> connection <0.10219.0> (192.168.129.19:41424 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:58:36.712489+00:00 [warning] <0.10219.0> closing AMQP connection <0.10219.0> (192.168.129.19:41424 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:36.712489+00:00 [warning] <0.10219.0> client unexpectedly closed TCP connection
2023-08-26 00:58:36.712761+00:00 [warning] <0.10201.0> closing AMQP connection <0.10201.0> (192.168.129.19:41420 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:36.712761+00:00 [warning] <0.10201.0> client unexpectedly closed TCP connection
2023-08-26 00:58:39.437658+00:00 [warning] <0.1536.0> closing AMQP connection <0.1536.0> (192.168.129.19:45736 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:39.437658+00:00 [warning] <0.1536.0> client unexpectedly closed TCP connection
2023-08-26 00:58:40.999813+00:00 [warning] <0.1416.0> closing AMQP connection <0.1416.0> (192.168.129.19:45638 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:40.999813+00:00 [warning] <0.1416.0> client unexpectedly closed TCP connection
2023-08-26 00:58:56.519831+00:00 [warning] <0.1296.0> closing AMQP connection <0.1296.0> (192.168.129.19:45560 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:58:56.519831+00:00 [warning] <0.1296.0> client unexpectedly closed TCP connection
2023-08-26 00:58:56.973822+00:00 [error] <0.1044.0> closing AMQP connection <0.1044.0> (192.168.129.19:58054 -> 192.168.129.15:5672):
2023-08-26 00:58:56.973822+00:00 [error] <0.1044.0> missed heartbeats from client, timeout: 60s
2023-08-26 00:59:00.084892+00:00 [warning] <0.3738.0> closing AMQP connection <0.3738.0> (192.168.129.19:56970 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:00.084892+00:00 [warning] <0.3738.0> client unexpectedly closed TCP connection
2023-08-26 00:59:08.435955+00:00 [info] <0.10262.0> accepting AMQP connection <0.10262.0> (192.168.129.19:38396 -> 192.168.129.15:5672)
2023-08-26 00:59:08.437242+00:00 [info] <0.10262.0> connection <0.10262.0> (192.168.129.19:38396 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:59:08.441407+00:00 [info] <0.10280.0> accepting AMQP connection <0.10280.0> (192.168.129.19:38412 -> 192.168.129.15:5672)
2023-08-26 00:59:08.442329+00:00 [info] <0.10280.0> connection <0.10280.0> (192.168.129.19:38412 -> 192.168.129.15:5672): user 'dlreg' authenticated and granted access to vhost '/'
2023-08-26 00:59:09.643408+00:00 [warning] <0.10280.0> closing AMQP connection <0.10280.0> (192.168.129.19:38412 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:09.643408+00:00 [warning] <0.10280.0> client unexpectedly closed TCP connection
2023-08-26 00:59:09.643629+00:00 [warning] <0.10262.0> closing AMQP connection <0.10262.0> (192.168.129.19:38396 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:09.643629+00:00 [warning] <0.10262.0> client unexpectedly closed TCP connection
2023-08-26 00:59:09.644291+00:00 [warning] <0.1356.0> closing AMQP connection <0.1356.0> (192.168.129.19:45596 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:09.644291+00:00 [warning] <0.1356.0> client unexpectedly closed TCP connection
2023-08-26 00:59:10.100418+00:00 [warning] <0.1590.0> closing AMQP connection <0.1590.0> (192.168.129.19:36738 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:10.100418+00:00 [warning] <0.1590.0> client unexpectedly closed TCP connection
2023-08-26 00:59:23.214045+00:00 [warning] <0.2992.0> closing AMQP connection <0.2992.0> (192.168.129.19:54722 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:23.214045+00:00 [warning] <0.2992.0> client unexpectedly closed TCP connection
2023-08-26 00:59:27.715245+00:00 [warning] <0.1371.0> closing AMQP connection <0.1371.0> (192.168.129.19:45606 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:27.715245+00:00 [warning] <0.1371.0> client unexpectedly closed TCP connection
2023-08-26 00:59:30.737415+00:00 [warning] <0.1344.0> closing AMQP connection <0.1344.0> (192.168.129.19:45586 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:30.737415+00:00 [warning] <0.1344.0> client unexpectedly closed TCP connection
2023-08-26 00:59:34.242138+00:00 [warning] <0.1440.0> closing AMQP connection <0.1440.0> (192.168.129.19:45652 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:34.242138+00:00 [warning] <0.1440.0> client unexpectedly closed TCP connection
2023-08-26 00:59:37.772887+00:00 [warning] <0.1320.0> closing AMQP connection <0.1320.0> (192.168.129.19:45568 -> 192.168.129.15:5672, vhost: '/', user: 'dlreg'):
2023-08-26 00:59:37.772887+00:00 [warning] <0.1320.0> client unexpectedly closed TCP connection
worker log:
[2023-08-26 00:56:14,762: INFO/ForkPoolWorker-6] Resolving deltas: 99% (45243/45699)
[2023-08-26 00:56:14,799: INFO/ForkPoolWorker-6] Resolving deltas: 100% (45699/45699)
[2023-08-26 00:56:14,800: INFO/ForkPoolWorker-6] Resolving deltas: 100% (45699/45699), done.
[2023-08-26 00:56:14,800: INFO/ForkPoolWorker-6] Finished resolving deltas
[INFO] Completed clone attempts for Dataset(/data/cache/43a/4ca/db0b3c451fa973352af819f930)
[2023-08-26 00:56:18,055: INFO/ForkPoolWorker-35] Completed clone attempts for Dataset(/data/cache/43a/4ca/db0b3c451fa973352af819f930)
[INFO] Completed clone attempts for Dataset(/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee)
[2023-08-26 00:56:18,383: INFO/ForkPoolWorker-6] Completed clone attempts for Dataset(/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee)
[INFO] scanning for unlocked files (this may take some time)
[2023-08-26 00:56:20,115: INFO/ForkPoolWorker-6] scanning for unlocked files (this may take some time)
[INFO] scanning for unlocked files (this may take some time)
[2023-08-26 00:56:20,699: INFO/ForkPoolWorker-35] scanning for unlocked files (this may take some time)
[INFO] Remote origin not usable by git-annex; setting annex-ignore
[2023-08-26 00:56:20,716: INFO/ForkPoolWorker-6] Remote origin not usable by git-annex; setting annex-ignore
[INFO] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
| datalad siblings -d "/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee" enable -s SIBLING
[2023-08-26 00:56:21,184: INFO/ForkPoolWorker-6] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
| datalad siblings -d "/data/cache/69f/d04/f5ec714cd98b078fdf411c47ee" enable -s SIBLING
[INFO] Remote origin not usable by git-annex; setting annex-ignore
[2023-08-26 00:56:21,322: INFO/ForkPoolWorker-35] Remote origin not usable by git-annex; setting annex-ignore
[INFO] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
| datalad siblings -d "/data/cache/43a/4ca/db0b3c451fa973352af819f930" enable -s SIBLING
[2023-08-26 00:56:22,006: INFO/ForkPoolWorker-35] access to 2 dataset siblings dandi-dandizarrs-dropbox, dandiapi not auto-enabled, enable with:
| datalad siblings -d "/data/cache/43a/4ca/db0b3c451fa973352af819f930" enable -s SIBLING
[2023-08-26 00:56:27,890: CRITICAL/MainProcess] Unrecoverable error: PreconditionFailed(406, 'PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more', (0, 0), '')
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
cb(*cbargs)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
reader(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
drain_events(timeout=0)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
while not self.blocking_read(timeout):
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
return self.on_inbound_frame(frame)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
callback(channel, method_sig, buf, None)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
return self.channels[channel_id].dispatch_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
listener(*args)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
[2023-08-26 00:56:28,824: INFO/ForkPoolWorker-8] Task datalad_registry.tasks.process_dataset_url[e9962268-8bc2-4a1f-9504-60dfa8899df8] succeeded in 386.26602735184133s: None
[2023-08-26 00:56:30,721: INFO/ForkPoolWorker-4] Task datalad_registry.tasks.process_dataset_url[4efc703d-7a25-42ab-b517-2c5ff2ac3813] succeeded in 110.96896563656628s: None
[2023-08-26 00:56:56,276: INFO/ForkPoolWorker-26] Task datalad_registry.tasks.process_dataset_url[58415623-b2e0-4368-8b8e-3c0ea1bcf579] succeeded in 224.67544884234667s: None
[2023-08-26 00:56:59,487: INFO/ForkPoolWorker-13] Task datalad_registry.tasks.process_dataset_url[346749bd-cb3a-4695-b840-d32f488aeb8c] succeeded in 397.3926819656044s: None
[2023-08-26 00:57:06,593: INFO/ForkPoolWorker-28] Task datalad_registry.tasks.process_dataset_url[b1607ca4-64cf-4aae-93d1-9f3c1e21d4d7] succeeded in 295.41920122131705s: None
[2023-08-26 00:57:11,433: INFO/ForkPoolWorker-25] Task datalad_registry.tasks.process_dataset_url[7cec74af-092c-405c-bf1e-8fce7ad0c47a] succeeded in 264.1157172396779s: None
[2023-08-26 00:57:12,185: INFO/ForkPoolWorker-20] Task datalad_registry.tasks.process_dataset_url[4f6f974a-3894-4995-b2fc-56c6f17e0485] succeeded in 342.43002944998443s: None
[2023-08-26 00:57:15,168: INFO/ForkPoolWorker-11] Task datalad_registry.tasks.process_dataset_url[c6362f33-e32e-4254-93a7-49a6a242b6c9] succeeded in 196.7069190684706s: None
[2023-08-26 00:57:15,869: INFO/ForkPoolWorker-24] Task datalad_registry.tasks.process_dataset_url[ddb6b72a-b7bf-4086-ae10-df34bdadabef] succeeded in 419.4049945361912s: None
[2023-08-26 00:57:20,677: INFO/ForkPoolWorker-12] Task datalad_registry.tasks.process_dataset_url[f4c83fa8-8bb1-42ba-aad4-22cd696cff65] succeeded in 230.99079968221486s: None
[2023-08-26 00:57:29,181: INFO/ForkPoolWorker-15] Task datalad_registry.tasks.process_dataset_url[8982cfba-2b97-453b-bd96-34efbe6b4f18] succeeded in 295.66399951837957s: None
[2023-08-26 00:57:47,712: INFO/ForkPoolWorker-3] Task datalad_registry.tasks.process_dataset_url[e7ddfabb-c407-46db-a11e-5d050fca8810] succeeded in 365.551789149642s: None
[2023-08-26 00:58:04,235: INFO/ForkPoolWorker-23] Task datalad_registry.tasks.process_dataset_url[2b875c1d-d0ae-417d-b875-5e1b890873f5] succeeded in 247.2393686696887s: None
[2023-08-26 00:58:22,532: INFO/ForkPoolWorker-2] Task datalad_registry.tasks.process_dataset_url[fb2a2d74-6759-4b4b-9500-9790d843ee23] succeeded in 341.76253108121455s: None
[2023-08-26 00:58:27,671: INFO/ForkPoolWorker-10] Task datalad_registry.tasks.process_dataset_url[6e0ba849-8c7c-41d5-9ea6-39bed8e17475] succeeded in 226.7042004056275s: None
[2023-08-26 00:58:31,602: INFO/ForkPoolWorker-19] Task datalad_registry.tasks.process_dataset_url[1cccddda-6366-4196-ab15-46ec32a53c3f] succeeded in 321.799700550735s: None
[2023-08-26 00:58:32,263: INFO/ForkPoolWorker-29] Task datalad_registry.tasks.process_dataset_url[af3ea88c-d420-4522-82d7-ac14e82c1d31] succeeded in 193.45017509534955s: None
[2023-08-26 00:58:52,898: INFO/ForkPoolWorker-31] Task datalad_registry.tasks.process_dataset_url[b23c91f1-906f-4d49-aa3e-b1d8f2b316eb] succeeded in 198.251332083717s: None
[2023-08-26 00:58:57,360: INFO/ForkPoolWorker-34] Task datalad_registry.tasks.process_dataset_url[180fe4f7-1603-49fe-a023-df8056595421] succeeded in 333.59812715649605s: None
[2023-08-26 00:59:05,521: INFO/ForkPoolWorker-32] Task datalad_registry.tasks.process_dataset_url[02e00f7f-2856-4fe6-a9ed-6f56d9d66161] succeeded in 392.8192468546331s: None
[2023-08-26 00:59:08,981: INFO/ForkPoolWorker-5] Task datalad_registry.tasks.process_dataset_url[85d75533-1f68-444b-b44c-6bf7b7b9cf50] succeeded in 301.2957050167024s: None
[2023-08-26 00:59:16,971: INFO/ForkPoolWorker-33] Task datalad_registry.tasks.process_dataset_url[332d222a-61bf-478a-945f-3b465f15848b] succeeded in 268.5010988805443s: None
[2023-08-26 00:59:25,704: INFO/ForkPoolWorker-6] Task datalad_registry.tasks.process_dataset_url[28a679cf-bd92-4397-bbeb-910d9ec608e6] succeeded in 193.8229241631925s: None
[2023-08-26 00:59:25,883: INFO/ForkPoolWorker-14] Task datalad_registry.tasks.process_dataset_url[cd3aba57-9847-456c-9f6a-003bc2e7f1e9] succeeded in 259.70399592258036s: None
[2023-08-26 00:59:31,700: INFO/ForkPoolWorker-1] Task datalad_registry.tasks.process_dataset_url[ebce3850-2697-472c-876f-8a94cf8348b1] succeeded in 386.87066436745226s: None
[2023-08-26 00:59:34,339: INFO/ForkPoolWorker-7] Task datalad_registry.tasks.process_dataset_url[8917cf33-da8e-4270-95a2-aa1b3d49a5f9] succeeded in 223.30276307836175s: None
[INFO] Finished bids_dataset metadata extraction from /data/cache/f95/03d/d4a03d4e109b9a3a9c15bcd82d
[2023-08-26 01:00:02,184: INFO/ForkPoolWorker-18] Finished bids_dataset metadata extraction from /data/cache/f95/03d/d4a03d4e109b9a3a9c15bcd82d
[2023-08-26 01:00:02,191: INFO/ForkPoolWorker-18] Task datalad_registry.tasks.extract_ds_meta[d23f7901-2d38-419a-a4bd-78446dfcd26e] succeeded in 1437.0802579522133s: 'SUCCEEDED'
[2023-08-26 01:00:21,820: INFO/ForkPoolWorker-22] Task datalad_registry.tasks.process_dataset_url[6fb490af-436e-4b8c-8ed4-51954d03f13a] succeeded in 345.64947314560413s: None
[2023-08-26 01:00:28,264: INFO/ForkPoolWorker-9] Task datalad_registry.tasks.process_dataset_url[2524840f-ecc4-4f0a-948e-15de94befc12] succeeded in 360.8268299549818s: None
[2023-08-26 01:00:39,942: INFO/ForkPoolWorker-21] Task datalad_registry.tasks.process_dataset_url[fe791219-98d1-4730-99c5-6abd00ee9f42] succeeded in 325.8302733749151s: None
[2023-08-26 01:00:49,682: INFO/ForkPoolWorker-35] Task datalad_registry.tasks.process_dataset_url[35fc3e56-5fd5-45ba-9d3d-52b3024ea9fd] succeeded in 280.61846880428493s: None
[2023-08-26 01:01:17,938: INFO/ForkPoolWorker-30] Task datalad_registry.tasks.process_dataset_url[cc19c016-60b1-4a4b-abde-458f10270885] succeeded in 324.5811691340059s: None
[2023-08-26 01:01:19,067: CRITICAL/MainProcess] Couldn't ack 3217, reason:ConnectionResetError(104, 'Connection reset by peer')
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
cb(*cbargs)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
reader(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
drain_events(timeout=0)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
while not self.blocking_read(timeout):
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
return self.on_inbound_frame(frame)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
callback(channel, method_sig, buf, None)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
return self.channels[channel_id].dispatch_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
listener(*args)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 131, in ack_log_error
self.ack(multiple=multiple)
File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 126, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/usr/local/lib/python3.9/dist-packages/amqp/transport.py", line 347, in write
self._write(s)
ConnectionResetError: [Errno 104] Connection reset by peer
[2023-08-26 01:01:19,068: CRITICAL/MainProcess] Couldn't ack 3216, reason:BrokenPipeError(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
cb(*cbargs)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
reader(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
drain_events(timeout=0)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
while not self.blocking_read(timeout):
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
return self.on_inbound_frame(frame)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
callback(channel, method_sig, buf, None)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
return self.channels[channel_id].dispatch_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
listener(*args)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 131, in ack_log_error
self.ack(multiple=multiple)
File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 126, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/usr/local/lib/python3.9/dist-packages/amqp/transport.py", line 347, in write
self._write(s)
BrokenPipeError: [Errno 32] Broken pipe
[2023-08-26 01:01:19,068: ERROR/MainProcess] Error on stopping Hub: BrokenPipeError(32, 'Broken pipe')
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 332, in start
blueprint.start(self)
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/usr/local/lib/python3.9/dist-packages/celery/worker/consumer/consumer.py", line 628, in start
c.loop(*c.loop_args())
File "/usr/local/lib/python3.9/dist-packages/celery/worker/loops.py", line 97, in asynloop
next(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 373, in create_loop
cb(*cbargs)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 248, in on_readable
reader(loop)
File "/usr/local/lib/python3.9/dist-packages/kombu/transport/base.py", line 230, in _read
drain_events(timeout=0)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 525, in drain_events
while not self.blocking_read(timeout):
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 531, in blocking_read
return self.on_inbound_frame(frame)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 53, in on_frame
callback(channel, method_sig, buf, None)
File "/usr/local/lib/python3.9/dist-packages/amqp/connection.py", line 537, in on_inbound_method
return self.channels[channel_id].dispatch_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 156, in dispatch_method
listener(*args)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 293, in _on_close
raise error_for_code(
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/celery/bootsteps.py", line 148, in send_all
fun(parent, *args)
File "/usr/local/lib/python3.9/dist-packages/celery/worker/components.py", line 81, in stop
w.hub.close()
File "/usr/local/lib/python3.9/dist-packages/kombu/asynchronous/hub.py", line 274, in close
item()
File "/usr/local/lib/python3.9/dist-packages/vine/promises.py", line 160, in __call__
return self.throw()
File "/usr/local/lib/python3.9/dist-packages/vine/promises.py", line 157, in __call__
retval = fun(*final_args, **final_kwargs)
File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 131, in ack_log_error
self.ack(multiple=multiple)
File "/usr/local/lib/python3.9/dist-packages/kombu/message.py", line 126, in ack
self.channel.basic_ack(self.delivery_tag, multiple=multiple)
File "/usr/local/lib/python3.9/dist-packages/amqp/channel.py", line 1407, in basic_ack
return self.send_method(
File "/usr/local/lib/python3.9/dist-packages/amqp/abstract_channel.py", line 70, in send_method
conn.frame_writer(1, self.channel_id, sig, args, content)
File "/usr/local/lib/python3.9/dist-packages/amqp/method_framing.py", line 186, in write_frame
write(buffer_store.view[:offset])
File "/usr/local/lib/python3.9/dist-packages/amqp/transport.py", line 347, in write
self._write(s)
BrokenPipeError: [Errno 32] Broken pipe
Did you Google it?
After following the lead on https://stackoverflow.com/questions/69828547/precondition-failed-delivery-acknowledge-timeout-on-celery-rabbitmq-with-geve, I think I have found potential solution: increasing consumer_timeout
of the broker to 12 hours.
I have restarted an instance of DataLad-Registry on Typhon and restarted its population. If the worker doesn't crash after 12 hours, we should be good. Otherwise, there may be some tasks getting stuck.
For more information regarding this potential solution, please visit:
With PR #232, the worker been up for more than 12 hours (worker termination error used to occur about 1 hour after the start of the worker). Let's consider this issue resolved.
Note: Due to the limitation of disk space, the DataLad-Registry instance on Typhon couldn't be populated fully for the test of solution. It was only populated with 1500 datasets.
The Celery worker dies whenever it is relatively idle. The logs just before the termination of the worker and the associated broker (the RabbitMQ service) are the following:
Logs of the Celery worker:
Logs of the broker:
The complete logs of all the services after such a cash of the Celery service is available at
/srv/registry/logs/failed_executions/05-17-2023_2
on Typhon.