broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.69k stars 588 forks source link

Mutect2.wdl: "pet-@.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket #7492

Closed jkobject closed 3 years ago

jkobject commented 3 years ago

Hi,

Using GATK mutect2's wdl file on Terra (version 21 on agora) I keep getting the same error: "pet-102022583875839491351@broad-firecloud-ccle.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket"

Here is part of the stacktrace :

20:59:48.744 INFO Mutect2 - Inflater: IntelInflater
20:59:48.744 INFO Mutect2 - GCS max retries/reopens: 20
20:59:48.744 INFO Mutect2 - Requester pays: enabled. Billed to: broad-firecloud-ccle
20:59:48.744 INFO Mutect2 - Initializing engine
20:59:54.630 INFO FeatureManager - Using codec VCFCodec to read file gs://depmapomicsdata/1000g_pon.hg38.vcf.gz
20:59:55.629 INFO Mutect2 - Shutting down engine
[October 4, 2021 8:59:55 PM GMT] org.broadinstitute.hellbender.tools.walkers.mutect.Mutect2 done. Elapsed time: 0.12 minutes.
Runtime.totalMemory()=876609536
code: 403
message: pet-102022583875839491351@broad-firecloud-ccle.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket.
reason: forbidden
location: null
retryable: false
com.google.cloud.storage.StorageException: pet-102022583875839491351@broad-firecloud-ccle.iam.gserviceaccount.com does not have storage.buckets.get access to the Google Cloud Storage bucket.
at com.google.cloud.storage.spi.v1.HttpStorageRpc.translate(HttpStorageRpc.java:229)
at com.google.cloud.storage.spi.v1.HttpStorageRpc.get(HttpStorageRpc.java:406)
at com.google.cloud.storage.StorageImpl$4.call(StorageImpl.java:217)
...

This happens while it runs the command:

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx15500m\ 
-jar /root/gatk.jar Mutect2 -R gs://genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta\ 
-I gs://cclebams/hg38_wes/CDS-00rz9N.hg38.bam -tumor BC1_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE --germline-resource gs://gcp-public-data--gnomad/release/3.0/vcf/genomes/gnomad.genomes.r3.0.sites.vcf.bgz\ 
-pon gs://gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz\ 
-L gs://fc-secure-d2a2d895-a7af-4117-bdc7-652d7d268324/7a157f4a-7d93-4a3e-aaf4-c41833463f5a/Mutect2/3be8ce8e-1075-4063-bc43-6f61e386c3f5/call-SplitIntervals/cacheCopy/glob-0fc990c5ca95eebc97c4c204e3e303e1/0000-scattered.interval_list\ 
-O output.vcf.gz --f1r2-tar-gz f1r2.tar.gz --gcs-project-for-requester-pays broad-firecloud-ccle

But I gave read (both regular and legacy) access to gs://cclebams (this is a requester pays bucket).

This was done on GATK 4.2.2 docker.

Best,

lbergelson commented 3 years ago

We've definitely seen this exact issue before (https://github.com/broadinstitute/gatk/issues/6349, https://github.com/broadinstitute/gatk/pull/6594) and it was caused by the lack of correct permissions. The existing roles are very confusing. Make sure you actually have the storage.buckets.get role set. It's a feature of Storage Legacy Bucket Reader and not part of Storage Legacy Object Reader or Storage Viewer. Check that your service account definitely has that role for the appropriate bucket.

jkobject commented 3 years ago

Yes I know but still, I gave it Storage Legacy Bucket Reader access to both buckets.... Is it possible it can come from the gs://gcp-public-data--gnomad or gs://genomics-public-data buckets?

Screenshot 2021-10-05 at 16 05 09 Screenshot 2021-10-05 at 16 07 53
lbergelson commented 3 years ago

Interesting, it's definitely possible it's coming from one of the other buckets. I don't think we have fine grained control over WHICH bucket we attempt to read requester pays status from, so it's possible if it's enabled it's necessary to have that permission on every bucket. It's annoying that the error message doesn't say which reader is performing the access. Is there a longer stack trace available?

jkobject commented 3 years ago

I can confirm it was due to "gs://gcp-public-data--gnomad" not giving the correct authorization.. I had to copy the file in my own workspace.

It seems pretty problematic as it is the recommended file to run the workflow with...

jkobject commented 3 years ago

there is more to the stack trace but no information about which file/bucket is the problematic one..

jkobject commented 3 years ago

it worked, now I am back on another error I had already seen before: #7494