galaxyproject / pulsar

Distributed job execution application built for Galaxy
https://pulsar.readthedocs.io
Apache License 2.0
37 stars 50 forks source link

Consumer quitting in problematic fashion #328

Closed astalosj closed 1 year ago

astalosj commented 1 year ago

My pulsar endpoint (v0.15.2) connected to usegalaxy.eu (through mq.galaxyproject.eu RabbitMQ broker) sometimes stops accepting jobs. The jobs are stuck at Galaxy server and they do not appear in pulsar logs. After restarting pulsar the jobs start to run and finish without problems. Pulsar endpoint was installed from vggp-v60-j224-e0d36d08062d-dev image (Rocky Linux 9).

There are errors in the pulsar logs before it stops accepting jobs:

Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:10,927 ERROR [pulsar.client.amqp_exchange][consume-setup-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Problem consuming queue, consumer quitting in problematic fashion!
Jul 10 07:04:11  pulsar[205221]: Traceback (most recent call last):
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/amqp_exchange.py", line 141, in consume
Jul 10 07:04:11  pulsar[205221]:     connection.drain_events(timeout=self.__timeout)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/kombu/connection.py", line 316, in drain_events
Jul 10 07:04:11  pulsar[205221]:     return self.transport.drain_events(self.connection, **kwargs)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/kombu/transport/pyamqp.py", line 169, in drain_events
Jul 10 07:04:11  pulsar[205221]:     return connection.drain_events(**kwargs)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 525, in drain_events
Jul 10 07:04:11  pulsar[205221]:     while not self.blocking_read(timeout):
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 531, in blocking_read
Jul 10 07:04:11  pulsar[205221]:     return self.on_inbound_frame(frame)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/method_framing.py", line 53, in on_frame
Jul 10 07:04:11  pulsar[205221]:     callback(channel, method_sig, buf, None)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 537, in on_inbound_method
Jul 10 07:04:11  pulsar[205221]:     return self.channels[channel_id].dispatch_method(
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/abstract_channel.py", line 156, in dispatch_method
Jul 10 07:04:11  pulsar[205221]:     listener(*args)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 666, in _on_close
Jul 10 07:04:11  pulsar[205221]:     self._x_close_ok()
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 681, in _x_close_ok
Jul 10 07:04:11  pulsar[205221]:     self.send_method(spec.Connection.CloseOk, callback=self._on_close_ok)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/abstract_channel.py", line 70, in send_method
Jul 10 07:04:11  pulsar[205221]:     conn.frame_writer(1, self.channel_id, sig, args, content)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/method_framing.py", line 186, in write_frame
Jul 10 07:04:11  pulsar[205221]:     write(buffer_store.view[:offset])
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/transport.py", line 347, in write
Jul 10 07:04:11  pulsar[205221]:     self._write(s)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/transport.py", line 595, in _write
Jul 10 07:04:11  pulsar[205221]:     n = write(s)
Jul 10 07:04:11  pulsar[205221]:   File "/usr/lib64/python3.9/ssl.py", line 1119, in write
Jul 10 07:04:11  pulsar[205221]:     return self._sslobj.write(data)
Jul 10 07:04:11  pulsar[205221]: ConnectionResetError: [Errno 104] Connection reset by peer
Jul 10 07:04:11  pulsar[205221]: Exception in thread consume-setup-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1:
Jul 10 07:04:11  pulsar[205221]: Traceback (most recent call last):
Jul 10 07:04:11  pulsar[205221]:   File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:10,964 WARNI [pulsar.client.amqp_exchange][consume-status-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:10,964 WARNI [pulsar.client.amqp_exchange][consume-status-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]:     self.run()
Jul 10 07:04:11  pulsar[205221]:   File "/usr/lib64/python3.9/threading.py", line 917, in run
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,040 WARNI [pulsar.client.amqp_exchange][consume-kill-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]:     self._target(*self._args, **self._kwargs)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/messaging/bind_amqp.py", line 53, in drain
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,046 WARNI [pulsar.client.amqp_exchange][consume-kill-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,049 WARNI [pulsar.client.amqp_exchange][consume-setup-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,074 WARNI [pulsar.client.amqp_exchange][consume-status-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]:     __drain(name, queue_state, pulsar_exchange, callback)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/messaging/bind_amqp.py", line 101, in __drain
Jul 10 07:04:11  pulsar[205221]:     pulsar_exchange.consume(name, callback=callback, check=queue_state)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/pulsar/client/amqp_exchange.py", line 141, in consume
Jul 10 07:04:11  pulsar[205221]:     connection.drain_events(timeout=self.__timeout)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/kombu/connection.py", line 316, in drain_events
Jul 10 07:04:11  pulsar[205221]:     return self.transport.drain_events(self.connection, **kwargs)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/kombu/transport/pyamqp.py", line 169, in drain_events
Jul 10 07:04:11  pulsar[205221]:     return connection.drain_events(**kwargs)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 525, in drain_events
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,155 WARNI [pulsar.client.amqp_exchange][consume-kill-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]:     while not self.blocking_read(timeout):
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 531, in blocking_read
Jul 10 07:04:11  pulsar[205221]:     return self.on_inbound_frame(frame)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/method_framing.py", line 53, in on_frame
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,198 WARNI [pulsar.client.amqp_exchange][consume-setup-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got ConnectionForced, will retry: (0, 0): (320) CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'
Jul 10 07:04:11  pulsar[205221]:     callback(channel, method_sig, buf, None)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 537, in on_inbound_method
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,209 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_test__status] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]:     return self.channels[channel_id].dispatch_method(
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,215 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_benchmarking__status] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,215 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_benchmarking__kill] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/abstract_channel.py", line 156, in dispatch_method
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,216 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_benchmarking__setup] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,217 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_production__status] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]:     listener(*args)
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,217 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_test__kill] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 666, in _on_close
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,217 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_production__kill] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,219 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_production__setup] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]:     self._x_close_ok()
Jul 10 07:04:11  pulsar[205221]: 2023-07-10 07:04:11,219 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_test__setup] AMQP heartbeat thread exiting
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/connection.py", line 681, in _x_close_ok
Jul 10 07:04:11  pulsar[205221]:     self.send_method(spec.Connection.CloseOk, callback=self._on_close_ok)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/abstract_channel.py", line 70, in send_method
Jul 10 07:04:11  pulsar[205221]:     conn.frame_writer(1, self.channel_id, sig, args, content)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/method_framing.py", line 186, in write_frame
Jul 10 07:04:11  pulsar[205221]:     write(buffer_store.view[:offset])
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/transport.py", line 347, in write
Jul 10 07:04:11  pulsar[205221]:     self._write(s)
Jul 10 07:04:11  pulsar[205221]:   File "/opt/pulsar/venv3/lib64/python3.9/site-packages/amqp/transport.py", line 595, in _write
Jul 10 07:04:11  pulsar[205221]:     n = write(s)
Jul 10 07:04:11  pulsar[205221]:   File "/usr/lib64/python3.9/ssl.py", line 1119, in write
Jul 10 07:04:11  pulsar[205221]:     return self._sslobj.write(data)
Jul 10 07:04:11  pulsar[205221]: ConnectionResetError: [Errno 104] Connection reset by peer
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,370 WARNI [pulsar.client.amqp_exchange][consume-status-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,371 WARNI [pulsar.client.amqp_exchange][consume-kill-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,372 WARNI [pulsar.client.amqp_exchange][consume-kill-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,372 WARNI [pulsar.client.amqp_exchange][consume-kill-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,373 WARNI [pulsar.client.amqp_exchange][consume-status-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,373 WARNI [pulsar.client.amqp_exchange][consume-setup-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,374 WARNI [pulsar.client.amqp_exchange][consume-status-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:18  pulsar[205221]: 2023-07-10 07:04:18,387 WARNI [pulsar.client.amqp_exchange][consume-setup-pyamqp://galaxy_sk01:********@mq.galaxyproject.eu:5671//pulsar/galaxy_sk01?ssl=1] Got OperationalError, will retry: [Errno 111] Connection refused
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,056 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_benchmarking__status] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,082 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_test__setup] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,085 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_production__kill] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,100 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_test__kill] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,101 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_benchmarking__kill] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,103 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_production__status] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,104 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_test__status] AMQP heartbeat thread alive
Jul 10 07:04:26  pulsar[205221]: 2023-07-10 07:04:26,122 DEBUG [pulsar.client.amqp_exchange][consume-heartbeat-pulsar_benchmarking__setup] AMQP heartbeat thread alive

The errors appeared at (times are in CEST):

May 23 21:35:51
May 27 07:04:13
May 31 07:04:09
Jun 06 13:32:08
Jun 12 13:19:34
Jun 21 17:38:30
Jun 29 07:04:08
Jul 02 07:04:13
Jul 04 07:04:09
Jul 10 07:04:11

The time 07:04 is most critical but there's nothing unusual in the logs. The pulsar service is restarted by cron daily at 6:11. @mira-miracoli didn't find anything relevant in the mq.galaxyproject.eu logs.

Python packages versions:

amqp==5.1.1
bcrypt==4.0.1
bleach==6.0.0
boltons==23.0.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.0.1
cryptography==39.0.1
docutils==0.19
galaxy-job-metrics==22.1.1
galaxy-objectstore==22.1.1
galaxy-tool-util==22.5.0.dev2
galaxy-util==22.5.0.dev2
idna==3.4
importlib-resources==5.12.0
kombu==5.2.4
lxml==4.9.2
MarkupSafe==2.1.2
packaging==21.3
paramiko==3.0.0
PasteDeploy==3.0.1
psutil==5.9.4
pulsar-app==0.15.2
pycparser==2.21
pycurl==7.45.2
pydantic==1.10.5
pydantic-tes==0.1.5
pylockfile==0.0.3.3
PyNaCl==1.5.0
pyparsing==3.0.9
PyYAML==6.0
repoze.lru==0.7
requests==2.28.2
Routes==2.5.1
six==1.16.0
sortedcontainers==2.4.0
typing_extensions==4.5.0
urllib3==1.26.14
vine==5.0.0
webencodings==0.5.1
WebOb==1.8.7
zipp==3.15.0
zipstream-new==1.1.8
cat-bro commented 1 year ago

We are seeing this too on Galaxy Australia, only since upgrading from 0.14.13 to 0.15.2 two weeks ago.

astalosj commented 1 year ago

Based on suggestion from @mira-miracoli I've downgraded pulsar to version 0.15.2.dev1. There were some consumer errors right after rebooting pulsar central manager (Jul 14) but since then there are no errors in the logs.

mvdbeek commented 1 year ago

There are no code changes between 0.15.2.dev1 and 0.15.2, the only difference is the update in the changelog. You can see this in https://github.com/galaxyproject/pulsar/compare/0.15.1...0.15.2

mvdbeek commented 1 year ago

@astalosj thanks for the report and the logs. This should be fixed in 0.15.3 released yesterday.

astalosj commented 1 year ago

Thanks, I've updated pulsar to 0.15.3.

astalosj commented 1 year ago

After update to 0.15.3 pulsar stopped publishing job results (after some time). According to the logs it sent job status updates to AMQ but Galaxy did not fetch them (there are 12 messages in pulsar_production__status_update queue for my vhost). Let me know if it might be related, or if I should fill it as separate issue.

mvdbeek commented 1 year ago

Yes, that looks like a separate issue. If you have logs from the Galaxy side that would be helpful.

mvdbeek commented 1 year ago

@astalosj did you manage to get any logs from Galaxy if this happens ?

natefoo commented 1 year ago

@mvdbeek I think we also need to catch TimeoutError?

2023-08-23 15:04:04,686 ERROR [pulsar.client.amqp_exchange][consume-setup-amqp://main_pulsar:********@amqp.galaxyproject.org:5671//main_pulsar?ssl=1] Problem consuming queue, consumer quitting in problematic fashion!
Traceback (most recent call last):
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/pulsar/client/amqp_exchange.py", line 141, in consume
    connection.drain_events(timeout=self.__timeout)
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/kombu/connection.py", line 318, in drain_events
    return self.transport.drain_events(self.connection, **kwargs)
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/kombu/transport/pyamqp.py", line 135, in drain_events
    return connection.drain_events(**kwargs)
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/amqp/connection.py", line 523, in drain_events
    while not self.blocking_read(timeout):
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/amqp/connection.py", line 528, in blocking_read
    frame = self.transport.read_frame()
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/amqp/transport.py", line 299, in read_frame
    frame_header = read(7, True)
  File "/expanse/projects/qstore/pen160/xgalaxy/main/pulsar/venv/lib/python3.9/site-packages/amqp/transport.py", line 573, in _read
    s = recv(n - len(rbuf))  # see note above
  File "/expanse/projects/qstore/pen160/xgalaxy/conda/envs/__python@3.9/lib/python3.9/ssl.py", line 1101, in read
    return self._sslobj.read(len)
TimeoutError: [Errno 110] Connection timed out
mvdbeek commented 1 year ago

We do, that is very weird. Is kombu maybe outdated ?

mvdbeek commented 1 year ago

To answer my own question, that only works in 3.10+, will be fixed in https://github.com/galaxyproject/pulsar/pull/337