Open dtenenba opened 4 years ago
PS, when she clicks Details
, nothing happens.
Would you be able to paste the celery log here? I am not sure what could have caused that.
The Details
is not a button, it is just the title of the section. Maybe we should color it differently, such that it does not look like an anchor.
Here is an excerpt of the celery log. It's a legit permission error, the permission of the file is
-rw------- 1 gha ha_g_grp
and the user is not the owner (but is in the ha_g_grp
group).
[2020-03-05 15:26:00,579: INFO/MainProcess] Received task: motuz.api.tasks.copy_job[433]
[2020-03-05 15:26:00,596: INFO/ForkPoolWorker-14] RCLONE_CONFIG_DST_TYPE='s3' RCLONE_CONFIG_DST_REGION='us-west-2' RCLONE_CONFIG_DST_ACCESS_KEY_ID='***CDI3' RCLONE_CONFIG_DST_SECRET_ACCESS_KEY='***' sudo -E -u ahoge rclone copyto /fh/scratch/delete90/ha_g/CRPC_cfDNA dst:/fh-pi-ha-g/CRPC_cfDNA --progress --stats 2s
[2020-03-05 15:26:00,837: INFO/ForkPoolWorker-14] No match in
[2020-03-05 15:26:01,339: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-05 16:38:48,800: INFO/ForkPoolWorker-14] No match in * cfDNA_WGS_CellPaper/cf…etrics.sh.o.36935363.5: transferring
[2020-03-05 16:38:49,301: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:20,631: ERROR/ForkPoolWorker-14] ERROR : Attempt 1/3 failed with 3 errors and: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:20,841: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:20,976: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-06 00:05:21,297: ERROR/ForkPoolWorker-14] ERROR : Attempt 2/3 failed with 3 errors and: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-06 00:05:21,360: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WES_NatCommsPaper/18Dec15_cfDNA_WES/.FC16207540.markDuplicates.bam.nIriZj: permission denied
[2020-03-06 00:05:21,513: ERROR/ForkPoolWorker-14] ERROR : cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: Failed to copy: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:21,786: ERROR/ForkPoolWorker-14] ERROR : Attempt 3/3 failed with 3 errors and: failed to open source object: open /fh/scratch/delete90/ha_g/CRPC_cfDNA/cfDNA_WGS_CellPaper/cfDNA_WGS/04Nov2016_cfDNA_Prostate_DeepWGS/.4_FC19663990_HFW3TBBXX.4.aligned.duplicates_marked.bam.8T7TxY: permission denied
[2020-03-06 00:05:21,815: INFO/ForkPoolWorker-14] Copy process exited with exit status 1
[2020-03-06 00:05:22,474: INFO/ForkPoolWorker-14] Task motuz.api.tasks.copy_job[433] succeeded in 31161.893673621118s: {'text': 'GTransferred: 1.476T / 1.476 TBytes, 100%, 49.680 MBytes/s, ETA 0s
Errors: 3 (retrying may help)
Checks: 322 / 322, 100%
Transferred: 161 / 161, 100%
Elapsed time: 8h39m21.1s
This can be addressed by making the stderr and stdout of celery jobs permanent (in the database). This is something that I long wanted to do.
This is going to be an extremely common issue in mixed directories as only the owner can delete a file.
Ideally, one of two things would happen:
This particular issue is one that has been pointed out by our users a number of times. As users copy large amounts of data from shared on-prem storage to S3, there are inevitably permission problems etc that result in transfer failures. The vast majority of the time, these are silly files they don't care about anyways (tmp or buffer files never cleaned up etc). As noted, once the task is complete and new session logged into, all the celery task stderr and stdout is lost. All the user sees is "ERROR" and without details, they lack confidence that their data has been copied. The workaround was to ask us to manually verify the error by combing through the celery task logs.
We have now added persistence of copy task output to the DB. Upon the completion of a copy job, the output and error will be stored in the DB. Upon a user clicking a task, if the task is has completed, we deliver the final output and error from the DB. If the task is active, then we deliver the stderr and output directly from celery (ie same behavior as before). I will pull request our tweak in case its of any use to you.
We have a user who tried to copy a directory on rhino to S3. There were two files in the directory that had owner read permissions but no group read permissions (and this user is in the group but is not the owner). The copy failed but did not give her any feedback, this is what she saw:
It was only by looking in the logs of the celery container that I could see the permission problem.
I tried to reproduce this problem myself by creating a directory with 3 files in it, one of which was owned by root and only had owner read permissions. When I tried to copy this directory to S3, I did get a permissions error:
You can't actually see the permissions error in that screenshot but if I had scrolled to the right you would see "permission denied".
Do you have any idea why this user did not see an error in the copy job dialog?
Let me know if you need more information. Thanks.