DataBiosphere / terra-notebook-utils

Utilities for the Terra notebook environment.
MIT License
7 stars 6 forks source link

TNU DRS operations fail in Terra project-per-workspace (PPW) workspaces (Urgent) #369

Closed mbaumann-broad closed 2 years ago

mbaumann-broad commented 2 years ago

Current Behavior: When TNU is run to copy/download DRS data in a project-per-wokspace (PPW) workspace (any workspace created after 9/27/2021) the operation produces the following warning and often fails with the following error:

drs.copy("drs://dg.4503/93f98458-e816-4e56-9bea-013dc6c0ea4b", ".")
2021-11-01 02:53:26::INFO  Enabling requester pays for your workspace. This will only take a few seconds...
2021-11-01 02:53:26::WARNING  Failed to init requester pays for workspace terra-f20dfb56/mbaumann_tmp_test_tnu_v0_8_2 20211031: Expected '204', got '404' for 'https://rawls.dsde-prod.broadinstitute.org/api/workspaces/terra-f20dfb56/mbaumann_tmp_test_tnu_v0_8_2%2020211031/enableRequesterPaysForLinkedServiceAccounts'. You will not be able to access DRS URIs that interact with requester pays buckets.
2021-11-01 02:53:27::ERROR  copy failed: 'gs://nih-nhlbi-biodata-catalyst-tutorial-genome-data/GWAS/1kg-genotypes/gds_maf001/ALL.chr8.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bi_maf001.vcf.bgz.gds' to '/home/jupyter/notebooks/mbaumann_tmp_test_tnu_v0_8_2 20211031/edit/ALL.chr8.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bi_maf001.vcf.bgz.gds'
Traceback (most recent call last):
  File "/home/jupyter/notebooks/packages/terra_notebook_utils/blobstore/copy_client.py", line 69, in _do_copy
    _download(src_blob, dst_blob, indicator_type)
  File "/home/jupyter/notebooks/packages/terra_notebook_utils/blobstore/copy_client.py", line 27, in _download
    with Indicator.get(indicator_type, dst_blob.url, src_blob.size()) as indicator:
  File "/home/jupyter/notebooks/packages/terra_notebook_utils/blobstore/gs.py", line 162, in size
    return self._get_native_blob().size
  File "/home/jupyter/notebooks/packages/terra_notebook_utils/blobstore/gs.py", line 90, in _get_native_blob
    return _get_native_blob(self._gs_bucket, self.key, self.credentials, self.billing_project)
  File "/home/jupyter/notebooks/packages/terra_notebook_utils/blobstore/gs.py", line 48, in _get_native_blob
    blob = bucket.get_blob(key)
  File "/home/jupyter/notebooks/packages/google/cloud/storage/bucket.py", line 1214, in get_blob
    retry=retry,
  File "/home/jupyter/notebooks/packages/google/cloud/storage/_helpers.py", line 225, in reload
    retry=retry,
  File "/home/jupyter/notebooks/packages/google/cloud/storage/_http.py", line 78, in api_request
    return call()
  File "/home/jupyter/notebooks/packages/google/api_core/retry.py", line 291, in retry_wrapped_func
    on_error=on_error,
  File "/home/jupyter/notebooks/packages/google/api_core/retry.py", line 189, in retry_target
    return target()
  File "/home/jupyter/notebooks/packages/google/cloud/_http.py", line 484, in api_request
    raise exceptions.from_http_response(response)
google.api_core.exceptions.Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/nih-nhlbi-biodata-catalyst-tutorial-genome-data/o/GWAS%2F1kg-genotypes%2Fgds_maf001%2FALL.chr8.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.bi_maf001.vcf.bgz.gds?userProject=terra-f20dfb56&projection=noAcl&prettyPrint=false: jcjfvchhjamh0xv8f62efl4pn-989@dcpstage-210518.iam.gserviceaccount.com does not have serviceusage.services.use access to the Google Cloud project.

This happens with both the TNU DRS API and CLI.

The problem exists in TNU v0.8.2 and all prior versions.

Expected Behavior TNU DRS operations should work successfully in PPW workspaces, just as they do in non-PPW workspaces (workspaces created before 9/27/2021)

Root Cause The root cause is that the TNU DRS subcommands use the GOOGLE_PROJECT environment available to determine the "workspace namespace", which is used to call the Rawls method enableRequesterPaysForLinkedServiceAccounts. The Rawls workspace operations are written to take the Terra billing project as the workspace namespace. Before PPW, the workspace namespace and the GOOGLE_PROJECT values were the same, with PPW workspaces they are different. In the error output above, it may be seen that terra-f20dfb56 is being passed to Rawls as the workspace namespace, when in this PPW workspace, the Terra billing project name is anvil-stage-demo, and that is the value that should be passed to Rawls.

Partial Workaround For TNU CLI use, this can be worked around by using tnu config set-workspace-namespace to set the value to the name of the Terra billing project.

For TNU API use, I haven't yet found a way to work around this based on my tests to date. I haven't tried all possibilities. A review of the code may reveal a way to workaround this from the API also, I don't yet know.

How to Fix TNU should use the WORKSPACE_NAMESPACE environment variable instead of the GOOGLE_PROJECT environment variable to work properly with PPW workspaces.

How to Reproduce

  1. Create a new workspace in Terra
  2. Perform a tnu drs copy operation
mbaumann-broad commented 2 years ago

This was fixed in: https://github.com/DataBiosphere/terra-notebook-utils/pull/373

mbaumann-broad commented 2 years ago

This was fixed in: https://github.com/DataBiosphere/terra-notebook-utils/pull/373