Open mikkonie opened 5 months ago
Dump of error. It seems the iRODS connection crashes when performing a specific ACL operation.
sodar-celeryd-default_1 | 2024-06-25 12:34:52,679 [ERROR] taskflowbackend.api: Error running flow: BatchMoveDataObjectsTask failed: Could not receive server response (Error getting permissions of target "<iRODSDataObject xxx yyy.vcf.gz>")
sodar-celeryd-default_1 | [2024-06-25 12:35:03,530: INFO/MainProcess] Task landingzones.tasks_celery.trigger_zone_move[7821ab7d-1b47-4341-886e-9e3e5080b66d] received
sodar-celeryd-default_1 | [2024-06-25 12:35:04,281: ERROR/ForkPoolWorker-16] Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
sodar-celeryd-default_1 | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] Exception ignored in:
sodar-celeryd-default_1 | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] <function Connection.__del__ at 0x7f91069963a0>
sodar-celeryd-default_1 | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] Traceback (most recent call last):
sodar-celeryd-default_1 | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] File "/usr/local/lib/python3.8/site-packages/irods/connection.py", line 109, in __del__
sodar-celeryd-default_1 | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16]
sodar-celeryd-default_1 | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16] self.disconnect()
sodar-celeryd-default_1 | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16] File "/usr/local/lib/python3.8/site-packages/irods/connection.py", line 322, in disconnect
sodar-celeryd-default_1 | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16]
sodar-celeryd-default_1 | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16] self.send(disconnect_msg)
sodar-celeryd-default_1 | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16] File "/usr/local/lib/python3.8/site-packages/irods/connection.py", line 125, in send
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16]
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] raise NetworkException("Unable to send message")
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] irods.exception
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] .
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] NetworkException
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] :
sodar-celeryd-default_1 | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] Unable to send message
For the record, the crash happens (consistently) here:
try:
target_access = self.irods.acls.get(target=target)
except Exception as ex:
self._raise_irods_exception(
ex,
'Error getting permissions of target "{}"'.format(target),
)
Temporarily fixed by rolling back to v1.1.x. I need to try reproduce this in dev to figure out what is going on.
Personal note: I need to also check the actions preceeding the point of actual crash. It's possible the previous step fails silently and we try to impose ACLs on a file which is not in the expected place or state.
The repo has now been upgraded for iRODS 4.3.3 and python-irodsclient 2.1.0. I will look into whether this still happens when testing the new release in staging.
One thought: could it be possible this is caught by the same redirect issue in python-irodsclient v1.1.9+ that caused problems I detailed in #2007 and #2022?
If so, we may have to wait for the release of v2.1.1 before deploying this release. See #2023
Something I noticed happening in production. A user had
landing_zone_move
flow fail with the following error:I have not experienced this in practice and the tests have not encountered it. Possibly something to do with python-irodsbackend v2.0 update?
The user afterwards modified the zone and received a more regular error. Still, this needs to be looked into ASAP.