bihealth / sodar-server

SODAR: System for Omics Data Access and Retrieval
https://github.com/bihealth/sodar-server
MIT License
14 stars 3 forks source link

Python-irodsclient v2.x raises permission errors in BatchMoveDataObjectsTask #1955

Open mikkonie opened 5 months ago

mikkonie commented 5 months ago

Something I noticed happening in production. A user had landing_zone_move flow fail with the following error:

Error running flow: BatchMoveDataObjectsTask failed: Could not receive server response (Error getting permissions of target "")

I have not experienced this in practice and the tests have not encountered it. Possibly something to do with python-irodsbackend v2.0 update?

The user afterwards modified the zone and received a more regular error. Still, this needs to be looked into ASAP.

mikkonie commented 5 months ago

Dump of error. It seems the iRODS connection crashes when performing a specific ACL operation.

sodar-celeryd-default_1  | 2024-06-25 12:34:52,679 [ERROR] taskflowbackend.api: Error running flow: BatchMoveDataObjectsTask failed: Could not receive server response (Error getting permissions of target "<iRODSDataObject xxx yyy.vcf.gz>")
sodar-celeryd-default_1  | [2024-06-25 12:35:03,530: INFO/MainProcess] Task landingzones.tasks_celery.trigger_zone_move[7821ab7d-1b47-4341-886e-9e3e5080b66d] received
sodar-celeryd-default_1  | [2024-06-25 12:35:04,281: ERROR/ForkPoolWorker-16] Unable to send message. Connection to remote host may have closed. Releasing connection from pool.
sodar-celeryd-default_1  | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] Exception ignored in: 
sodar-celeryd-default_1  | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] <function Connection.__del__ at 0x7f91069963a0>
sodar-celeryd-default_1  | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16] Traceback (most recent call last):
sodar-celeryd-default_1  | [2024-06-25 12:35:04,287: WARNING/ForkPoolWorker-16]   File "/usr/local/lib/python3.8/site-packages/irods/connection.py", line 109, in __del__
sodar-celeryd-default_1  | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16]     
sodar-celeryd-default_1  | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16] self.disconnect()
sodar-celeryd-default_1  | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16]   File "/usr/local/lib/python3.8/site-packages/irods/connection.py", line 322, in disconnect
sodar-celeryd-default_1  | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16]     
sodar-celeryd-default_1  | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16] self.send(disconnect_msg)
sodar-celeryd-default_1  | [2024-06-25 12:35:04,288: WARNING/ForkPoolWorker-16]   File "/usr/local/lib/python3.8/site-packages/irods/connection.py", line 125, in send
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16]     
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] raise NetworkException("Unable to send message")
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] irods.exception
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] .
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] NetworkException
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] : 
sodar-celeryd-default_1  | [2024-06-25 12:35:04,289: WARNING/ForkPoolWorker-16] Unable to send message
mikkonie commented 5 months ago

For the record, the crash happens (consistently) here:

            try:
                target_access = self.irods.acls.get(target=target)
            except Exception as ex:
                self._raise_irods_exception(
                    ex,
                    'Error getting permissions of target "{}"'.format(target),
                )
mikkonie commented 5 months ago

Temporarily fixed by rolling back to v1.1.x. I need to try reproduce this in dev to figure out what is going on.

mikkonie commented 4 months ago

Personal note: I need to also check the actions preceeding the point of actual crash. It's possible the previous step fails silently and we try to impose ACLs on a file which is not in the expected place or state.

mikkonie commented 1 month ago

The repo has now been upgraded for iRODS 4.3.3 and python-irodsclient 2.1.0. I will look into whether this still happens when testing the new release in staging.

mikkonie commented 1 month ago

One thought: could it be possible this is caught by the same redirect issue in python-irodsclient v1.1.9+ that caused problems I detailed in #2007 and #2022?

If so, we may have to wait for the release of v2.1.1 before deploying this release. See #2023