AmpliconSuite / AmpliconSuite-pipeline

A quickstart tool for AmpliconArchitect. Performs all preliminary steps (alignment, CNV calling, seed interval detection) required prior to running AmpliconArchitect. Previously called PrepareAA.
Other
53 stars 28 forks source link

Docker permissions #25

Open MrDotOne opened 2 years ago

MrDotOne commented 2 years ago

Just pulled PAA down the other day and have running it, my run command is:

/data/PrepareAA/docker/run_paa_docker.py -o /data/output -s Colo -t 16 --bam /data/Data/Colo/cofinal.bam --run_AA --run_AC

however after 22+ hours i get to this point and if fails miserably:

/home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes

Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in f2gf = open("feature_to_graph.txt", 'w') PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt' Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in with open(args.input) as input_file, open(args.classification_file) as classification_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv' 2022-07-27 22:49:31.494158

I am unsure where the feature_to_graph.txt should be found and the Colo_amplicon_classification_profiles.tsv doesnt seem to be getting generated.

Any assistance would be appreciated

jluebeck commented 2 years ago

Hi,

I have updated PrepareAA to handle issues related to permissions of the output directory in 580f923 and also consolidate a file from AmpliconClassifier that may be trying to write to a location not in nessarily in that same spot. Can you please pull the latest version of the docker image and try again? You may already have done so, but also please double check that the location you are hoping to save data to exists and has write permissions for root.

Thanks, Jens

MrDotOne commented 2 years ago

I made a change to the run file so when you execute it, it looks like this

docker run -u id -u $USER:id -g $USER --rm -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v :/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh

So everything should be read and written as the enduser running the app.

I will pull down the update(s) and give it a shot. Thank you.

MrDotOne commented 2 years ago

Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.

I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:

-u id -u $USER:id -g $USER

jluebeck commented 2 years ago

Thank you, this is a good suggestion, I will incorporate it.

On Thu, Jul 28, 2022, 4:17 PM MrDotOne @.***> wrote:

Pulled and running, it will take over 20hours but i will let you know. Thank you for your time.

I do find adding the following to the run script avoids a lot of issues, so the data is written as the caregiver and not root:

" -u id -u $USER:id -g $USER "

— Reply to this email directly, view it on GitHub https://github.com/jluebeck/PrepareAA/issues/25#issuecomment-1198715797, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADM3Q43O4XBY5NPJ2CEIIU3VWMIHXANCNFSM546XOEXA . You are receiving this because you commented.Message ID: @.***>

MrDotOne commented 2 years ago

Someone on another repo suggested it, when i was having issues with the results being written as root and the person running it didnt have escalation privileges. I thought i would pass on that nugget.

MrDotOne commented 2 years ago

I am still having issues

[root:INFO] #TIME 79252.045 Plotting SV View for amplicon7 [root:INFO] #TIME 79318.830 Total Runtime /home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied grep: write error: Broken pipe /home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied find: 'standard output': Broken pipe find: write error /home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected cat: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied paste: san.txt: No such file or directory rm: cannot remove 'san.txt': No such file or directory rm: cannot remove 'scf.txt': No such file or directory rm: cannot remove 'sgf.txt': No such file or directory AmpliconClassifier 0.4.9 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes

Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in f2gf = open("feature_to_graph.txt", 'w') PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt' Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in with open(args.input) as input_file, open(args.classification_file) as classification_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv' 2022-07-28 23:07:27.730295 PrepareAA version 0.1203.1

Matched /home/bam_dir/cofinal.bam to reference genome GRCh38 Running PrepareAA on sample: Colo

Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam

Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns

Cleaning up temporary files rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn gzip /home/output/Colo_cnvkit_output/cofinal.cnr

Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo

Running AC /home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv Completed

2022-07-29 21:26:25.262009

MrDotOne commented 2 years ago

I will run as root and that should fix it but ...

MrDotOne commented 2 years ago

OK, i reran the run as root using the run script as provided in the repo. It seems to have completed successfully. This is good progress. However, the two times i have run it with the run -u id $UID:id $GID it fails. I need to figure out how to get the results written as the caregiver so i dont have to intervene.

MrDotOne commented 2 years ago

Unfortunately that is not working. The run file works fine, for root, but not for a non-escalated account. I keep getting this error when i run as a user with the id stuff in the run command

[root:INFO] #TIME 79384.895 Plotting SV View for amplicon7 [root:INFO] #TIME 79452.068 Total Runtime /home/programs/AmpliconClassifier-main/make_input.sh: line 6: scf.txt: Permission denied grep: write error: Broken pipe /home/programs/AmpliconClassifier-main/make_input.sh: line 7: sgf.txt: Permission denied find: 'standard output': Broken pipe find: write error /home/programs/AmpliconClassifier-main/make_input.sh: line 8: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: sgf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 8: [: : integer expression expected cat: scf.txt: No such file or directory /home/programs/AmpliconClassifier-main/make_input.sh: line 12: san.txt: Permission denied paste: san.txt: No such file or directory rm: cannot remove 'san.txt': No such file or directory rm: cannot remove 'scf.txt': No such file or directory rm: cannot remove 'sgf.txt': No such file or directory AmpliconClassifier 0.4.9 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity reading /home/data_repo/GRCh38/Genes_hg38.gff read 22998 genes

Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/amplicon_classifier.py", line 667, in f2gf = open("feature_to_graph.txt", 'w') PermissionError: [Errno 13] Permission denied: 'feature_to_graph.txt' Traceback (most recent call last): File "/home/programs/AmpliconClassifier-main/make_results_table.py", line 65, in with open(args.input) as input_file, open(args.classification_file) as classification_file: FileNotFoundError: [Errno 2] No such file or directory: '/home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv' 2022-07-31 01:44:32.102144 PrepareAA version 0.1203.1

Matched /home/bam_dir/cofinal.bam to reference genome GRCh38 Running PrepareAA on sample: Colo

Running CNVKit batch python3 /home/programs/cnvkit.py batch -m wgs -r /home/data_repo/GRCh38/GRCh38_cnvkit_filtered_ref.cnn -p 16 -d /home/output/Colo_cnvkit_output/ /home/bam_dir/cofinal.bam

Running CNVKit segment python3 /home/programs/cnvkit.py segment /home/output/Colo_cnvkit_output/cofinal.cnr -p 16 -m cbs -o /home/output/Colo_cnvkit_output/cofinal.cns

Cleaning up temporary files rm /home/output/Colo_cnvkit_output//tmp.bed /home/output/Colo_cnvkit_output//.cnn gzip /home/output/Colo_cnvkit_output/cofinal.cnr

Running amplified_intervals python /home/programs/AmpliconArchitect-master/src/amplified_intervals.py --ref GRCh38 --bed /home/output/Colo_cnvkit_output/cofinal_CNV_GAIN.bed --bam /home/bam_dir/cofinal.bam --gain 4.5 --cnsize_min 50000 --out /home/output/Colo_AA_CNV_SEEDS python /home/programs/AmpliconArchitect-master/src/AmpliconArchitect.py --ref GRCh38 --downsample 10.0 --bed /home/output/Colo_AA_CNV_SEEDS.bed --bam /home/bam_dir/cofinal.bam --runmode FULL --extendmode EXPLORE --insert_sdevs 3.0 --out /home/output//Colo_AA_results//Colo

Running AC /home/programs/AmpliconClassifier-main/make_input.sh /home/output//Colo_AA_results/ /home/output//Colo_classification/Colo python3 /home/programs/AmpliconClassifier-main/amplicon_classifier.py -i /home/output//Colo_classification/Colo.input --ref GRCh38 -o /home/output//Colo_classification/Colo --annotate_cycles_file --report_complexity python3 /home/programs/AmpliconClassifier-main/make_results_table.py -i /home/output//Colo_classification/Colo.input --classification_file /home/output//Colo_classification/Colo_amplicon_classification_profiles.tsv Completed

2022-08-01 00:05:22.206630

jluebeck commented 2 years ago

Hi,

Thank you for sharing. I have also now done some testing on my end and it appears that assigning a custom user for the image is non-trivial and that the above proposed solution (adding -u id $UID:id $GID) does not quite work as expected. I recommend that users run with the current default settings, generating the files as root and then users can chmod or copy the relevant files later if they need non-root ownership. I do not plan to address this issue of non-root ownership in the PrepareAA generated files at this particular time, but perhaps in the future if there is a compelling reason.

Jens

MrDotOne commented 2 years ago

Non-root users cannon chown/chgrp files., that is a serious cybersecurity concern.

MrDotOne commented 2 years ago

Is there a way to implement a python script within the run file to do something similar to this?

(base) [root@lri-uapps-2 data]# cat chown.py import os path = "/data/output" for root, dirs, files in os.walk(path): for momo in dirs: os.chown(os.path.join(root, momo), 1035688, 1001025) for momo in files: os.chown(os.path.join(root, momo), 1035688, 1001025)

Michael

jluebeck commented 2 years ago

Hi Michael,

Without re-assigning user IDs inside the container itself or alternatively sharing the /etc/passwd file from the host machine with the docker image, there is no way to provide the docker image with exact same user ids account/group information of the host machine. The previously proposed solution runs the image as a specific user inside the image, but that user is not mapped to the same user on the host machine. Perhaps one option is instead to have the docker script recursively chmod to add global read/write permissions on all the files written by the image into the mounted directory when it is finished. Would this solution be satisfactory for you? I can test this out in the next couple of days.

Jens

MrDotOne commented 2 years ago

That is a solution i am trying to implement. I tried to use /home/output however the result was no such file or directory.

MrDotOne commented 2 years ago

I just pulled [fc3b5e8] and will give a try with the --run_as_user option which looks promising already:

docker run --rm -e HOST_UID=$(id -u) -e HOST_GID=$(id -g) -u $(id -u):$(id -g) -e AA_DATA_REPO=/home/data_repo -e argstring="$argstring" -v $AA_DATA_REPO:/home/data_repo -v /data/Data/Colo:/home/bam_dir -v /data/Data/Colo:/home/norm_bam_dir -v /home/bendahm:/home/bed_dir -v /data/output:/home/output -v /data/mosek/8/licenses:/home/programs/mosek/8/licenses jluebeck/prepareaa bash /home/run_paa_script.sh

I will let you know what i find. Thank you for looking into this

MrDotOne commented 2 years ago

This is perfect:

(base) [root@lri-uapps-2 data]# cd output (base) [root@lri-uapps-2 output]# ls -la total 20 drwxrwxrwx 3 bendahm ccdomainusers 113 Aug 5 15:03 . drwxrwxrwx 19 root root 4096 Aug 5 15:02 .. drwxr-xr-x 2 bendahm ccdomainusers 126 Aug 5 15:10 Colo_cnvkit_output -rw-r--r-- 1 bendahm ccdomainusers 0 Aug 5 15:03 Colo_timing_log.txt -rw-r--r-- 1 bendahm ccdomainusers 1931 Aug 5 15:03 docker_home_manifest.log -rw-r--r-- 1 bendahm ccdomainusers 11525 Aug 5 15:10 PAA_stdout.log

jluebeck commented 2 years ago

Glad to hear it is working for you. Reopening issue for others who may run in to issues despite this fix. I will note that this solution works as long as the docker daemon is configured to not offset UIDs and GIDs, which is sometimes done to improve security of the host machine. More info about the docker namespace remapping is available here: https://docs.oracle.com/cd/E37670_01/E75728/html/ol-docker-userns-remap.html.

Jens

MrDotOne commented 2 years ago

Thank you for the fixes and the link, i will check it out. There are a couple other repos like this that could use this technique. Unfortunately, we may be in research here, but this is not academia, and we lock stuff down pretty tightly. Sometimes to the point where things are unusable. This was of great benefit. Thank you.