Problem running Dockerized protect / where to find SSE-C key for downloading reference data?

anttikos commented 3 years ago

I'm having a trouble running protect, and figuring out what exactly is failing, as the output is 4655 lines long.. I'm most probably just missing some parameters, but I'm not sure which ones exactly.

Starting from line 1143 I start getting errors related to Amazon S3 authentication: 0091ac8c0f62 2021-06-16 13:00:23,448 MainThread WARNING toil.leader: H/2/jobmrdMHX RuntimeError: s3am failed with (boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['HmacAuthV1Handler'] Check your credentials) while downloading (S3://protect-data/hg38_references/gencode.v25.pc_transcripts.fa.tar.gz) 0091ac8c0f62 2021-06-16 13:00:23,448 MainThread WARNING toil.leader: H/2/jobmrdMHX ERROR:toil.worker:Exiting the worker because of a failed job on host 0091ac8c0f62

I'm assuming I should use the --sse-key and --sse-key-is-master parameters to set up the SSE-C key in order to download the reference genome files, but I have trouble finding where could I find the actual key files. Also, I'm not sure if this is the main issue, or are the other issues as well.

Any help is highly appreciated!

Here is the command I'm using to run protect docker run -v /var/run/docker.sock:/var/run/docker.sock -v /data:/data/ quay.io/ucsc_cgl/protect:2.5.6-1.13.0 --sample-name sample_name --tumor-dna /data/dna/tumor_dna_sample.merged_1.fq.gz --tumor-dna2 /data/dna/tumor_dna_sample.merged_2.fq.gz --normal-dna /data/dna/normal_dna_sample.merged_1.fq.gz --normal-dna2 /data/dna/normal_dna_sample.merged_2.fq.gz --tumor-rna /data/rna/tumor_rna_sample_1.fq.gz --tumor-rna2 /data/rna/tumor_rna_sample_2.fq.gz --reference-build hg38 --tumor-type PRAD --work-mount /data/protect 2> error_log_2021-06-16.txt

Attached is the full output log

error_log_2021-06-16.txt

adamnovak commented 2 years ago

I don't think this is an SSE (server-side encryption) problem. It sounds like you don't have a .aws credentials directory available to the container. Usually it would want to be in the home directory inside the container.

Unfortunately, even when I try with my UCSC AWS account, I don't have permission to access that data either:

aws s3 cp s3://protect-data/hg38_references/gencode.v25.pc_transcripts.fa.tar.gz -
download failed: s3://protect-data/hg38_references/gencode.v25.pc_transcripts.fa.tar.gz to - An error occurred (403) when calling the HeadObject operation: Forbidden

So the real problem may be that the permissions on the protect-data bucket have changed; the data may have previously been available for public unauthenticated access and now it is not anymore.

@erichweiler Did you change the permissions on the protect-data bucket when we were concerned about leaking money due to public S3 access?

erichweiler commented 2 years ago

I don't believe so - but let me check the permissions on it. Which account is that bucket in?

-erich

On 10/11/21 4:51 PM, Adam Novak wrote:

I don't think this is an SSE (server-side encryption) problem. It sounds like you don't have a |.aws| credentials directory available to the container. Usually it would want to be in the home directory inside the container.

Unfortunately, even when I try with my UCSC AWS account, I don't have permission to access that data either:

|aws s3 cp s3://protect-data/hg38_references/gencode.v25.pc_transcripts.fa.tar.gz - download failed: s3://protect-data/hg38_references/gencode.v25.pc_transcripts.fa.tar.gz to - An error occurred (403) when calling the HeadObject operation: Forbidden |

So the real problem may be that the permissions on the |protect-data| bucket have changed; the data may have previously been available for public unauthenticated access and now it is not anymore.

@erichweiler https://github.com/erichweiler Did you change the permissions on the |protect-data| bucket when we were concerned about leaking money due to public S3 access?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BD2KGenomics/protect/issues/294#issuecomment-940523590, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE7BRQ2I5NZ5DN7ZL5ZFMMTUGNZ6RANCNFSM4626SWCA. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

hbeale commented 2 years ago

You need requester-payer. Here are the commands I ran:

aws s3 ls --request-payer requester protect-data/hg38_references/
aws s3 sync --dryrun --request-payer requester s3://protect-data/hg38_references /mnt/neoepitopes/protect_references/
aws s3 sync --request-payer requester s3://protect-data/hg38_references /mnt/neoepitopes/protect_references/

adamnovak commented 2 years ago

@erichweiler Unfortunately I don't know where the bucket is. But @hbeale assures me that the real problem here is that you need to have AWS account credentials available, and you need to use requester-pays mode to download, which is behind the --request-payer flag in aws. I'm not sure if s3am needs any special flag for that; it might accept needing to pay for downloads by default, if you give it AWS credentials to use.

BD2KGenomics / protect

Problem running Dockerized protect / where to find SSE-C key for downloading reference data? #294