FredHutch / motuz

Motuz - A web based infrastructure for large scale data movements between on-premise and cloud
MIT License
102 stars 12 forks source link

handling permissions errors #254

Open dtenenba opened 4 years ago

dtenenba commented 4 years ago

We have a user who tried to copy a directory on rhino to S3. There were two files in the directory that had owner read permissions but no group read permissions (and this user is in the group but is not the owner). The copy failed but did not give her any feedback, this is what she saw:

image

It was only by looking in the logs of the celery container that I could see the permission problem.

I tried to reproduce this problem myself by creating a directory with 3 files in it, one of which was owned by root and only had owner read permissions. When I tried to copy this directory to S3, I did get a permissions error:

image

You can't actually see the permissions error in that screenshot but if I had scrolled to the right you would see "permission denied".

Do you have any idea why this user did not see an error in the copy job dialog?

Let me know if you need more information. Thanks.

dtenenba commented 4 years ago

PS, when she clicks Details, nothing happens.

aicioara commented 4 years ago

Would you be able to paste the celery log here? I am not sure what could have caused that.

The Details is not a button, it is just the title of the section. Maybe we should color it differently, such that it does not look like an anchor.

dtenenba commented 4 years ago

Here is an excerpt of the celery log. It's a legit permission error, the permission of the file is

-rw------- 1 gha ha_g_grp

and the user is not the owner (but is in the ha_g_grp group).

[2020-03-05 15:26:00,579: INFO/MainProcess] Received task: motuz.api.tasks.copy_job[433]
[2020-03-05 15:26:00,596: INFO/ForkPoolWorker-14] RCLONE_CONFIG_DST_TYPE='s3' RCLONE_CONFIG_DST_REGION='us-west-2' RCLONE_CONFIG_DST_ACCESS_KEY_ID='***CDI3' RCLONE_CONFIG_DST_SECRET_ACCESS_KEY='***' sudo -E -u ahoge rclone copyto /fh/scratch/delete90/ha_g/CRPC_cfDNA dst:/fh-pi-ha-g/CRPC_cfDNA --progress --stats 2s
[2020-03-05 15:26:00,837: INFO/ForkPoolWorker-14] No match in
[2020-03-05 15:26:01,339: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-05 16:38:48,800: INFO/ForkPoolWorker-14] No match in * cfDNA_WGS_CellPaper/cf…etrics.sh.o.36935363.5: transferring
[2020-03-05 16:38:49,301: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:20,631: ERROR/ForkPoolWorker-14] ERROR : Attempt 1/3 failed with 3 errors and: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:20,841: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:20,976: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-06 00:05:21,297: ERROR/ForkPoolWorker-14] ERROR : Attempt 2/3 failed with 3 errors and: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-06 00:05:21,360: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-06 00:05:21,513: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:21,786: ERROR/ForkPoolWorker-14] ERROR : Attempt 3/3 failed with 3 errors and: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:21,815: INFO/ForkPoolWorker-14] Copy process exited with exit status 1
[2020-03-06 00:05:22,474: INFO/ForkPoolWorker-14] Task motuz.api.tasks.copy_job[433] succeeded in 31161.893673621118s: {'text': 'GTransferred: 1.476T / 1.476 TBytes, 100%, 49.680 MBytes/s, ETA 0s
      Errors: 3 (retrying may help)
      Checks: 322 / 322, 100%
 Transferred: 161 / 161, 100%
Elapsed time: 8h39m21.1s
aicioara commented 4 years ago

This can be addressed by making the stderr and stdout of celery jobs permanent (in the database). This is something that I long wanted to do.

bmcgough commented 4 years ago

This is going to be an extremely common issue in mixed directories as only the owner can delete a file.

Ideally, one of two things would happen:

  1. Motuz crawls the requested source tree as the user and verifies the needed permissions (read on copy, read/write/delete on move) before the action is attempted.
  2. Motuz builds a file list first, and can then report to the user what files failed.
zackramjan commented 2 years ago

This particular issue is one that has been pointed out by our users a number of times. As users copy large amounts of data from shared on-prem storage to S3, there are inevitably permission problems etc that result in transfer failures. The vast majority of the time, these are silly files they don't care about anyways (tmp or buffer files never cleaned up etc). As noted, once the task is complete and new session logged into, all the celery task stderr and stdout is lost. All the user sees is "ERROR" and without details, they lack confidence that their data has been copied. The workaround was to ask us to manually verify the error by combing through the celery task logs.

We have now added persistence of copy task output to the DB. Upon the completion of a copy job, the output and error will be stored in the DB. Upon a user clicking a task, if the task is has completed, we deliver the final output and error from the DB. If the task is active, then we deliver the stderr and output directly from celery (ie same behavior as before). I will pull request our tweak in case its of any use to you.