DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Downloads from `/repository/files/{file_id}` time out with 504 & 503 #4384

Closed dsotirho-ucsc closed 2 years ago

dsotirho-ucsc commented 2 years ago

From: https://broadinstitute.zendesk.com/agent/tickets/292435

We are trying to download the released HCA DCP 2.0 datasets via the HCA Data Portal. We have been using the curl command (bash shell) on CentOS Linux 8.6 server.

The cURL command worked fine for most of the downloads except for the following human/mouse projects where it throws an error (see below). Could you please help resolve this issue?

1) FetalLiverHaematopoiesis 2) FetalLungImmune 3) FreshAndFrozenTumours 4) MegakaryocyteDevelopment 5) Single-cellRNA-seqrevealscelltype-specificmolecula 6) humanAdultUreter 7) HumanMouseEntericNervousSystem

Error: curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)

dsotirho-ucsc commented 2 years ago
* About to connect() to service.azul.data.humancellatlas.org port 443 (#20)
*   Trying 13.35.73.129...
* Connected to service.azul.data.humancellatlas.org (13.35.73.129) port 443 (#20)
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*   subject: CN=service.azul.data.humancellatlas.org
*   start date: Oct 19 00:00:00 2021 GMT
*   expire date: Nov 16 23:59:59 2022 GMT
*   common name: service.azul.data.humancellatlas.org
*   issuer: CN=Amazon,OU=Server CA 1B,O=Amazon,C=US
> GET /repository/files/3f0bfbb4-c9be-56db-9eb5-b37a6dc306a2?catalog=dcp18&version=2021-02-09T08%3A32%3A25.000000Z HTTP/1.1
> User-Agent: curl/7.29.0
> Host: service.azul.data.humancellatlas.org
> Accept: */*
> Accept-Encoding: deflate, gzip
> 
* The requested URL returned error: 504 Gateway Timeout
* Closing connection 20
@timestamp @message
2022-08-10 19:39:07.935 {"accountId": "542754589326", "apiId": "i4ew2rtktg", "authorizer_claims_property": "-", "authorizer_error": "-", "authorizer_principalId": "-", "authorizer_property": "-", "awsEndpointRequestId": "-", "awsEndpointRequestId2": "-", "customDomain_basePathMatched": "(none)", "dataProcessed": "-", "domainName": "service.azul.data.humancellatlas.org", "domainPrefix": "service", "error_message": "Endpoint request timed out", "error_messageString": " "Endpoint request timed out"", "error_responseType": "INTEGRATION_TIMEOUT", "extendedRequestId": "WqZmZFhDoAMFn6Q=", "httpMethod": "GET", "identity_accountId": "-", "identity_caller": "-", "identity_cognitoAuthenticationProvider": "-", "identity_cognitoAuthenticationType": "-", "identity_cognitoIdentityId": "-", "identity_cognitoIdentityPoolId": "-", "identity_principalOrgId": "-", "identity_clientCert_clientCertPem": "-", "identity_clientCert_subjectDN": "-", "identity_clientCert_issuerDN": "-", "identity_clientCert_serialNumber": "-", "identity_clientCert_validity_notBefore": "-", "identity_clientCert_validity_notAfter": "-", "identity_sourceIp": "170.223.207.78", "identity_user": "-", "identity_userAgent": "curl/7.29.0", "identity_userArn": "-", "integration_error": "Execution failed due to a timeout error", "integration_integrationStatus": "-", "integration_latency": "29000", "integration_requestId": "-", "integration_status": "-", "integrationErrorMessage": "-", "integrationLatency": "29000", "integrationStatus": "-", "path": "/repository/files/3f0bfbb4-c9be-56db-9eb5-b37a6dc306a2", "protocol": "HTTP/1.1", "requestId": "ed2d3145-865c-448e-bce1-cb85d30cc7ac", "requestTime": "10/Aug/2022:19:39:07 +0000", "requestTimeEpoch": "1660160347935", "responseLatency": "29026", "responseLength": "41", "routeKey": "-", "stage": "prod", "status": "504"}
2022-08-10 19:39:07.987 START RequestId: 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 Version: $LATEST
2022-08-10 19:39:07.991 [INFO] 2022-08-10T19:39:07.988Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.chalice Received GET request for '/repository/files/3f0bfbb4-c9be-56db-9eb5-b37a6dc306a2', with query {"catalog": "dcp18", "version": "2021-02-09T08:32:25.000000Z"} and headers {"accept": "/", "accept-encoding": "deflate, gzip", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "40831", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "curl/7.29.0", "via": "1.1 91ba452fa0dd14b0102b6441c9a2d2d4.cloudfront.net (CloudFront)", "x-amz-cf-id": "v2X3c_QL1RCizff0JMUQksyw_MH27wE01ftqge3E4JkGugga-RiJuA==", "x-amzn-trace-id": "Root=1-62f4095b-03f83dff4b960e8a042fdb1e", "x-forwarded-for": "170.223.207.78, 70.132.21.138", "x-forwarded-port": "443", "x-forwarded-proto": "https"}.
2022-08-10 19:39:07.991 [INFO] 2022-08-10T19:39:07.989Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.chalice Did not authenticate request.
2022-08-10 19:39:08.014 [DEBUG] 2022-08-10T19:39:08.013Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request('GET', https://data.terra.bio/api/repository/v1/snapshots?offset=0&limit=200&sort=created_date&direction=asc, headers=None, timeout=5.0, body=None)
2022-08-10 19:39:11.390 [DEBUG] 2022-08-10T19:39:11.390Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request(…) -> b'{"total":452,"filteredTotal":452,"items":[{"id":"3ea18ed4-17fd-4209-b3b9-dfcb3f2f05a9","name":"hca_prod_df88f39f01a84b5b92f43177d6c0f242__20211004_20211004","description":"Create snapshot hca_prod_df88f39f01a84b5b92f43177d6c0f242__20211004_20211004","cr...'
2022-08-10 19:39:11.392 [DEBUG] 2022-08-10T19:39:11.392Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request('GET', https://data.terra.bio/api/repository/v1/snapshots?offset=200&limit=200&sort=created_date&direction=asc, headers=None, timeout=5.0, body=None)
2022-08-10 19:39:16.187 [DEBUG] 2022-08-10T19:39:16.187Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request(…) -> b'{"total":452,"filteredTotal":452,"items":[{"id":"6d39533c-1f7f-494e-9cee-40ed2f801e19","name":"hca_prod_d012d4768f8c4ff389d6ebbe22c1b5c1__20220119_dcp2_20220121_dcp12","description":"Create snapshot hca_prod_d012d4768f8c4ff389d6ebbe22c1b5c1__20220119_dc...'
2022-08-10 19:39:16.189 [DEBUG] 2022-08-10T19:39:16.189Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request('GET', https://data.terra.bio/api/repository/v1/snapshots?offset=400&limit=200&sort=created_date&direction=asc, headers=None, timeout=5.0, body=None)
2022-08-10 19:39:19.878 [DEBUG] 2022-08-10T19:39:19.878Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request(…) -> b'{"total":452,"filteredTotal":452,"items":[{"id":"5f006aa6-bae4-4241-a334-4f0e0b731c35","name":"hca_prod_34da2c5f801148afa7fdad2f56ec10f4__20220606_dcp2_20220804_dcp19","description":"Create snapshot hca_prod_34da2c5f801148afa7fdad2f56ec10f4__20220606_dc...'
2022-08-10 19:39:19.879 [DEBUG] 2022-08-10T19:39:19.878Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request('GET', https://data.terra.bio/api/repository/v1/snapshots?offset=452&limit=200&sort=created_date&direction=asc, headers=None, timeout=5.0, body=None)
2022-08-10 19:39:23.177 [DEBUG] 2022-08-10T19:39:23.177Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.terra _request(…) -> b'{"total":452,"filteredTotal":452,"items":[],"roleMap":{}}'
2022-08-10 19:39:23.236 [INFO] 2022-08-10T19:39:23.236Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 elasticsearch POST https://search-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp18_files_aggregate/_search [status:200 request:0.039s]
2022-08-10 19:39:23.239 [INFO] 2022-08-10T19:39:23.239Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.drs GET https://data.terra.bio/ga4gh/drs/v1/objects/v1_7ce06c67-dfce-43c2-94a0-7c9c2f6add40_e4b73397-7ff9-40c9-8201-fbf01386041e ...
2022-08-10 19:39:38.685 [INFO] 2022-08-10T19:39:38.685Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.drs -> 200 b'{"id":"v1_7ce06c67-dfce-43c2-94a0-7c9c2f6add40_e4b73397-7ff9-40c9-8201-fbf01386041e","name":"951a8ffc-69c5-4f7c-8ee8-39b76171f630.loom","self_uri":"drs://data.terra.bio/v1_7ce06c67-dfce-43c2-94a0-7c9c2f6add40_e4b73397-7ff9-40c9-8201-fbf01386041e","size":1735761928,"created_time":"2022-01-19T01:31:19.119Z","updated_time":"2022-01-19T01:31:19.119Z","version":"0","mime_type":null,"checksums":[{"checksum":"cee09154","type":"crc32c"},{"checksum":"c53f9394e4cd07292a9bde824f8fb3f2","type":"md5"}],"access_methods":[{"type":"gs","access_url":{"url":"gs://datarepo-37829d28-bucket/cb486d5a-2e22-4266-827f-771732e1faa6/e4b73397-7ff9-40c9-8201-fbf01386041e/951a8ffc-69c5-4f7c-8ee8-39b76171f630.loom","headers":null},"access_id":"gcp-us-east4","region":"us-east4","authorizations":{"supported_types":["BearerAuth"],"passport_auth_issuers":null,"bearer_auth_issuers":null}},{"type":"https","access_url":{"url":"https://www.googleapis.com/storage/v1/b/datarepo-37829d28-bucket/o/cb486d5a-2e22-4266-827f-771732e1faa6%2Fe4b73397-7ff9-40c9-8201-fbf01386041e%2F951a8ffc-69c5-4f7c-8ee8-39b76171f630.loom?alt=media","headers":["Authorization: Bearer ya29.c.b0AXv0zTO-RSIlgNpVAixV57lgzBeZ10cqO42TrsRsTHwXbJrURc5lQ6ZJ1hjZL_oBZFZaZI8lUbwrm1W162uIaX8JEGC1PzWByF3aYpmi2ZHncr70uLAi2XpAz41xWO21iz5C2qfD5biUfv11Jb_31JjYd24rd8P7D4gFFcZ_9x2X2e65lJannCTMqOCHxG215dDq8_3U4l9iiI0B41FZg0wlwjC3gOJl......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................."]},"access_id":null,"region":"us-east4","authorizations":{"supported_types":["BearerAuth"],"passport_auth_issuers":null,"bearer_auth_issuers":null}}],"contents":null,"description":null,"aliases":["/hca_prod_f2fe82f044544d84b416a885f3121e59__20220119_dcp2/v1/3f0bfbb4-c9be-56db-9eb5-b37a6dc306a2/cee09154/951a8ffc-69c5-4f7c-8ee8-39b76171f630.loom"]}' HTTPHeaderDict({'Date': 'Wed, 10 Aug 2022 19:39:38 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'e1nmoAPw', 'Content-Type': 'application/json', 'Content-Length': '2508', 'Vary': 'Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2022-08-10 19:39:38.686 [INFO] 2022-08-10T19:39:38.686Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 azul.drs GET https://data.terra.bio/ga4gh/drs/v1/objects/v1_7ce06c67-dfce-43c2-94a0-7c9c2f6add40_e4b73397-7ff9-40c9-8201-fbf01386041e/access/gcp-us-east4 ...
2022-08-10 19:39:39.027 END RequestId: 506df9d2-1d09-4159-a4ca-60bc5b53f1a6
2022-08-10 19:39:39.027 REPORT RequestId: 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 Duration: 31037.91 ms Billed Duration: 31000 ms Memory Size: 2048 MB Max Memory Used: 206 MB
2022-08-10 19:39:39.027 2022-08-10T19:39:39.026Z 506df9d2-1d09-4159-a4ca-60bc5b53f1a6 Task timed out after 31.04 seconds
dsotirho-ucsc commented 2 years ago
* About to connect() to service.azul.data.humancellatlas.org port 443 (#24)
*   Trying 13.35.73.25...
* Connected to service.azul.data.humancellatlas.org (13.35.73.25) port 443 (#24)
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate:
*   subject: CN=service.azul.data.humancellatlas.org
*   start date: Oct 19 00:00:00 2021 GMT
*   expire date: Nov 16 23:59:59 2022 GMT
*   common name: service.azul.data.humancellatlas.org
*   issuer: CN=Amazon,OU=Server CA 1B,O=Amazon,C=US
> GET /repository/files/d5a6d596-22a1-4fc3-8859-a87913284a9a?catalog=dcp18&version=2020-08-13T10%3A24%3A32.747000Z HTTP/1.1
> User-Agent: curl/7.29.0
> Host: service.azul.data.humancellatlas.org
> Accept: */*
> Accept-Encoding: deflate, gzip
> 
* The requested URL returned error: 503 Service Unavailable
* Closing connection 24
@timestamp @message
2022-08-10 20:43:19.833 {"accountId": "542754589326", "apiId": "i4ew2rtktg", "authorizer_claims_property": "-", "authorizer_error": "-", "authorizer_principalId": "-", "authorizer_property": "-", "awsEndpointRequestId": "e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5", "awsEndpointRequestId2": "-", "customDomain_basePathMatched": "(none)", "dataProcessed": "-", "domainName": "service.azul.data.humancellatlas.org", "domainPrefix": "service", "error_message": "-", "error_messageString": "-", "error_responseType": "-", "extendedRequestId": "WqjAQH04IAMFvwQ=", "httpMethod": "GET", "identity_accountId": "-", "identity_caller": "-", "identity_cognitoAuthenticationProvider": "-", "identity_cognitoAuthenticationType": "-", "identity_cognitoIdentityId": "-", "identity_cognitoIdentityPoolId": "-", "identity_principalOrgId": "-", "identity_clientCert_clientCertPem": "-", "identity_clientCert_subjectDN": "-", "identity_clientCert_issuerDN": "-", "identity_clientCert_serialNumber": "-", "identity_clientCert_validity_notBefore": "-", "identity_clientCert_validity_notAfter": "-", "identity_sourceIp": "170.223.207.78", "identity_user": "-", "identity_userAgent": "curl/7.29.0", "identity_userArn": "-", "integration_error": "-", "integration_integrationStatus": "200", "integration_latency": "19790", "integration_requestId": "e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5", "integration_status": "503", "integrationErrorMessage": "-", "integrationLatency": "19790", "integrationStatus": "200", "path": "/repository/files/d5a6d596-22a1-4fc3-8859-a87913284a9a", "protocol": "HTTP/1.1", "requestId": "99d8b6df-f54d-46ab-9275-265924bb06be", "requestTime": "10/Aug/2022:20:43:19 +0000", "requestTimeEpoch": "1660164199833", "responseLatency": "19801", "responseLength": "186", "routeKey": "-", "stage": "prod", "status": "503"}
2022-08-10 20:43:19.856 START RequestId: e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 Version: $LATEST
2022-08-10 20:43:19.858 [INFO] 2022-08-10T20:43:19.857Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 azul.chalice Received GET request for '/repository/files/d5a6d596-22a1-4fc3-8859-a87913284a9a', with query {"catalog": "dcp18", "version": "2020-08-13T10:24:32.747000Z"} and headers {"accept": "/", "accept-encoding": "deflate, gzip", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "40831", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "curl/7.29.0", "via": "1.1 d6251c5e474087984c15b7cf5a45887e.cloudfront.net (CloudFront)", "x-amz-cf-id": "iewVF_oitcr6xtNgoUFGgEDrTYsFNVNoUagLZOZ0q0EXqUsHUusvEw==", "x-amzn-trace-id": "Root=1-62f41867-3af1c5b4524e54531f8e049a", "x-forwarded-for": "170.223.207.78, 70.132.21.162", "x-forwarded-port": "443", "x-forwarded-proto": "https"}.
2022-08-10 20:43:19.858 [INFO] 2022-08-10T20:43:19.858Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 azul.chalice Did not authenticate request.
2022-08-10 20:43:19.878 [DEBUG] 2022-08-10T20:43:19.877Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 azul.terra _request('GET', https://data.terra.bio/api/repository/v1/snapshots?offset=0&limit=200&sort=created_date&direction=asc, headers=None, timeout=5.0, body=None)
2022-08-10 20:43:24.583 [DEBUG] 2022-08-10T20:43:24.583Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 azul.terra _request(…) -> b'{"total":452,"filteredTotal":452,"items":[{"id":"3ea18ed4-17fd-4209-b3b9-dfcb3f2f05a9","name":"hca_prod_df88f39f01a84b5b92f43177d6c0f242__20211004_20211004","description":"Create snapshot hca_prod_df88f39f01a84b5b92f43177d6c0f242__20211004_20211004","cr...'
2022-08-10 20:43:24.585 [DEBUG] 2022-08-10T20:43:24.585Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 azul.terra _request('GET', https://data.terra.bio/api/repository/v1/snapshots?offset=200&limit=200&sort=created_date&direction=asc, headers=None, timeout=5.0, body=None)
2022-08-10 20:43:29.592 [WARNING] 2022-08-10T20:43:29.592Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 urllib3.connectionpool Retrying (Retry(total=None, connect=2, read=1, redirect=0, status=0)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='data.terra.bio', port=443): Read timed out. (read timeout=5.0)")': /api/repository/v1/snapshots?offset=200&limit=200&sort=created_date&direction=asc
2022-08-10 20:43:34.613 [WARNING] 2022-08-10T20:43:34.613Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 urllib3.connectionpool Retrying (Retry(total=None, connect=2, read=0, redirect=0, status=0)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='data.terra.bio', port=443): Read timed out. (read timeout=5.0)")': /api/repository/v1/snapshots?offset=200&limit=200&sort=created_date&direction=asc
2022-08-10 20:43:39.632 [DEBUG] 2022-08-10T20:43:39.632Z e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 azul.chalice Returning 503 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key"}. See next line for the first 1024 characters of the body. {"Code": "ServiceUnavailableError", "Message": "No response from https://data.terra.bio/api/repository/v1/snapshots?offset=200&limit=200&sort=created_date&direction=asc within 5.0 seconds"}
2022-08-10 20:43:39.633 END RequestId: e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5
2022-08-10 20:43:39.633 REPORT RequestId: e4b3bf7c-5ea4-4f0a-ae29-8924e92ffce5 Duration: 19775.95 ms Billed Duration: 19776 ms Memory Size: 2048 MB Max Memory Used: 174 MB
dsotirho-ucsc commented 2 years ago

We should consider adding --retry to the curl config.

--retry <num>
    If  a transient error is returned when curl tries to perform a transfer, it will retry this
    number of times before giving up. Setting the number to 0 makes curl do no retries (which is
    the default). Transient error means either: a timeout, an FTP 4xx response code or an HTTP 
    408 or 5xx response code.
hannes-ucsc commented 2 years ago

Great idea. Let's seek the help of the OP of the Zendesk ticket in trying the addition of --retry 3 to both curl invocations in the command line for one of the projects. To make this a meaningful test, the user should pick a project for which a significant number of files must still be downloaded. If there is no such project, we would need to ask the user to start the download from an empty directory. The logging additions to the command line should also be present.

Be sure that your Zendesk response contains the following sentence:

We have identified a slowly responding backend service (TDR) to be the cause of occasional timeouts when downloading individual files.

dsotirho-ucsc commented 2 years ago

We decided to use a retry count of 2 (so a download of a file would be attempted a total of 3 times if an error occurred) , and a retry-delay of 10 seconds.

hannes-ucsc commented 2 years ago

For demo, obtain a curl manifest from dev or prod. Augment the manifest with a URL that causes a 5xx error and further down, another URL that causes a 404. For the first URL and extra bonus points, write a server that returns 500 or 200 based on chance, say 50-50. Show that the 500 URL is eventually downloaded, while the 404 URL is not retried and causes an immediate exit with a non-zero exit status.