DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

File downloads are broken on AnVIL deployments #4507

Closed nadove-ucsc closed 1 year ago

nadove-ucsc commented 2 years ago
$ http https://service.anvil.gi.ucsc.edu/fetch/repository/files/90a04bb29d1d0d7ee64e4015f65b281c
HTTP/1.1 500 Internal Server Error
Access-Control-Allow-Headers: Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 990
Content-Type: text/plain
Date: Wed, 28 Sep 2022 23:53:18 GMT
Server: Server
X-Amzn-Trace-Id: Root=1-6334de6b-5ba05da16b5cced338cf046e;Sampled=0
x-amz-apigw-id: ZMew1Ex6oAMFv2w=
x-amzn-RequestId: 98344eb5-d0da-459f-a70d-10531ea00b1b

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1752, in _get_view_function_response
    response = view_function(**function_args)
  File "/var/task/app.py", line 1641, in fetch_repository_files
    body = _repository_files(file_uuid, fetch=True)
  File "/var/task/app.py", line 1679, in _repository_files
    return app.repository_controller.download_file(catalog=catalog,
  File "/var/task/azul/service/repository_controller.py", line 233, in download_file
    download.update(plugin, authentication)
  File "/var/task/azul/plugins/repository/tdr.py", line 216, in update
    access = drs_client.get_object(drs_uri, access_method=AccessMethod.gs)
  File "/var/task/azul/drs.py", line 90, in get_object
    return self._get_object(drs_uri, access_method)
  File "/var/task/azul/drs.py", line 151, in _get_object
    raise DRSError(response)
azul.drs.DRSError: (500, b'{"msg":"Bucket is a requester pays bucket but no user project provided.","status_code":500}')
nadove-ucsc commented 2 years ago

Not sure when/how this started since IT passed when the snapshots were updated, but now it fails on my personal deployment because of this.

achave11-ucsc commented 2 years ago

Also happening for anvilbox, IT now fails.

2022-09-28 17:00:51,818 INFO    MainThread: Beginning sub-test [repository_files] {'catalog': 'anvil-it'}
2022-09-28 17:00:51,819 INFO    MainThread: GET https://service.anvilbox.anvil.gi.ucsc.edu/index/files?catalog=anvil-it&filters=%7B%22file_format%22%3A+%7B%22is%22%3A+%5B%22fastq%22%2C+%22fastq.gz%22%5D%7D%7D&size=1&order=asc&sort=size ...
2022-09-28 17:00:52,120 INFO    MainThread: ... -> 200
2022-09-28 17:00:52,122 INFO    MainThread: GET https://service.anvilbox.anvil.gi.ucsc.edu/index/files?catalog=anvil-it&filters=%7B%7D&size=1&order=asc&sort=size ...
2022-09-28 17:00:52,375 INFO    MainThread: ... -> 200
2022-09-28 17:00:52,375 INFO    MainThread: GET https://service.anvilbox.anvil.gi.ucsc.edu/fetch/repository/files/30eeccbf52af083e29cde6486897fa2a?catalog=anvil-it&version=2022-06-01T00%3A00%3A00.000000Z ...
2022-09-28 17:00:53,513 INFO    MainThread: ... -> 500
2022-09-28 17:00:53,513 INFO    MainThread: Failed sub-test [repository_files] {'catalog': 'anvil-it'}
melainalegaspi commented 2 years ago

Assignee to discuss this with TDR team.

hannes-ucsc commented 2 years ago

BI is considering to remove requester-pays from these buckets.

https://ucsc-gi.slack.com/archives/C03TPJS54DC/p1664482406071929

If they do we need to remove he work around added in https://github.com/DataBiosphere/azul/pull/4480 (see referencing commits).

If they don't, we need to sign the URL ourselves, adding the userProject query parameter to it. The value of that parameter should be the GCP project of the Azul deployment, so we'll end up paying for downloads.

cc: @theathorn re cost

theathorn commented 2 years ago

Broad will sign URL - update to prod expected in ~1 week.

theathorn commented 2 years ago

From today's standup: Broad will not deploy the signed URL workaround to prod - they need to resolve the requester-pays requirements with the program sponsors. Direct file download is not a blocker for the stakeholder demo (which will instead be based on hand-off to Terra).

theathorn commented 1 year ago

Direct file download from the Files tab is working for me. I'm unsure which workarounds are in place or what the long term resolution is.

achave11-ucsc commented 1 year ago

@hannes-ucsc: "We tried this during parking lot and it appears to be working just fine."