DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Gateway timeouts cause IT to fail #3766

Closed amarjandu closed 2 years ago

amarjandu commented 2 years ago

This appears to be a transient error, the next IT run passed.

https://gitlab.dev.singlecell.gi.ucsc.edu/ucsc/azul/-/jobs/37365

2022-01-20 03:20:57,253 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/index/files?catalog=dcp2-it&size=100&filters=%7B%7D ...
2022-01-20 03:21:26,304 INFO    MainThread: ... -> 504
2022-01-20 03:21:26,304 INFO    MainThread: Failed sub-test [catalog_complete] {'catalog': 'dcp2-it'}
2022-01-20 03:21:26,304 INFO    MainThread: Beginning sub-test [managed_access] {}
2022-01-20 03:21:26,305 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/repository/sources?catalog=dcp2-it ...
2022-01-20 03:21:55,317 INFO    MainThread: ... -> 504
2022-01-20 03:21:55,318 INFO    MainThread: Failed sub-test [managed_access] {}
2022-01-20 03:21:55,318 INFO    MainThread: Beginning sub-test [manifest] {'catalog': 'dcp2-it', 'format': None, 'attempts': 1}
2022-01-20 03:21:55,319 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp2-it ...
2022-01-20 03:22:24,356 INFO    MainThread: ... -> 504
2022-01-20 03:22:24,356 INFO    MainThread: Failed sub-test [manifest] {'catalog': 'dcp2-it', 'format': None, 'attempts': 1}
2022-01-20 03:22:24,356 INFO    MainThread: Beginning sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'compact', 'attempts': 1}
2022-01-20 03:22:24,357 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp2-it&format=compact ...
2022-01-20 03:22:53,366 INFO    MainThread: ... -> 504
2022-01-20 03:22:53,367 INFO    MainThread: Failed sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'compact', 'attempts': 1}
2022-01-20 03:22:53,367 INFO    MainThread: Beginning sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'terra.bdbag', 'attempts': 1}
2022-01-20 03:22:53,367 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp2-it&format=terra.bdbag ...
2022-01-20 03:23:22,376 INFO    MainThread: ... -> 504
2022-01-20 03:23:22,377 INFO    MainThread: Failed sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'terra.bdbag', 'attempts': 1}
2022-01-20 03:23:22,377 INFO    MainThread: Beginning sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'terra.pfb', 'attempts': 1}
2022-01-20 03:23:22,377 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp2-it&format=terra.pfb ...
2022-01-20 03:23:51,386 INFO    MainThread: ... -> 504
2022-01-20 03:23:51,386 INFO    MainThread: Failed sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'terra.pfb', 'attempts': 1}
2022-01-20 03:23:51,386 INFO    MainThread: Beginning sub-test [manifest] {'catalog': 'dcp2-it', 'format': 'curl', 'attempts': 1}
2022-01-20 03:23:51,387 INFO    MainThread: GET https://service.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp2-it&format=curl ...
2022-01-20 03:24:09,825 INFO    MainThread: Redirecting https://service.dev.singlecell.gi.ucsc.edu/manifest/files?catalog=dcp2-it&format=curl ->
hannes-ucsc commented 2 years ago

Spike to correlate with API Gateway and Lambda logs.

achave11-ucsc commented 2 years ago

The following is the set of request observed in the APIGateway logs and the lambdas for where the referenced IT test was running.

@timestamp @message
2022-01-20 03:21:50.926 START RequestId: 7fb18845-fec2-447b-8115-ee1f59912534 Version: $LATEST
2022-01-20 03:21:50.929 [INFO] 2022-01-20T03:21:50.929Z 7fb18845-fec2-447b-8115-ee1f59912534 Received HEAD request for '/index/projects', with query {"size": "1"} and headers {"accept": "*/*", "accept-encoding": "gzip, deflate", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-country": "US", "host": "service.dev.singlecell.gi.ucsc.edu", "user-agent": "python-requests/2.26.0", "via": "1.1 5195de19cbc5ce842ac6538e9a6850ca.cloudfront.net (CloudFront)", "x-amz-cf-id": "geGnjtaAO2evxzKGwbuKRfxuiM0NpA5AikxXrQFMMGdGNoKQF5YUUw==", "x-amzn-trace-id": "Root=1-61e8d54e-24da128a779b9a5a15a6208f", "x-forwarded-for": "3.235.197.39, 70.132.33.151", "x-forwarded-port": "443", "x-forwarded-proto": "https"}.
2022-01-20 03:21:50.929 [INFO] 2022-01-20T03:21:50.929Z 7fb18845-fec2-447b-8115-ee1f59912534 Did not authenticate request.
2022-01-20 03:21:50.959 START RequestId: 0155897c-9093-4738-bfa3-4bc1804d84d0 Version: $LATEST
2022-01-20 03:21:50.964 [INFO] 2022-01-20T03:21:50.964Z 0155897c-9093-4738-bfa3-4bc1804d84d0 Received HEAD request for '/index/samples', with query {"size": "1"} and headers {"accept": "*/*", "accept-encoding": "gzip, deflate", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-country": "US", "host": "service.dev.singlecell.gi.ucsc.edu", "user-agent": "python-requests/2.26.0", "via": "1.1 6b8cdd1ce925ccd88cc918dd35811d06.cloudfront.net (CloudFront)", "x-amz-cf-id": "DQUkqpWPy1AjJiF0bc_4jl74RTMeinDHBu_bANAJVY3rAtQiA-qz1Q==", "x-amzn-trace-id": "Root=1-61e8d54e-31e72f9c2572054d2770b5eb", "x-forwarded-for": "3.235.197.39, 70.132.33.82", "x-forwarded-port": "443", "x-forwarded-proto": "https"}.
2022-01-20 03:21:50.964 [INFO] 2022-01-20T03:21:50.964Z 0155897c-9093-4738-bfa3-4bc1804d84d0 Did not authenticate request.
2022-01-20 03:21:51.370 [INFO] 2022-01-20T03:21:51.370Z 7fb18845-fec2-447b-8115-ee1f59912534 Found credentials in environment variables.
2022-01-20 03:21:51.413 [INFO] 2022-01-20T03:21:51.413Z 0155897c-9093-4738-bfa3-4bc1804d84d0 Found credentials in environment variables.
2022-01-20 03:21:51.440 [DEBUG] 2022-01-20T03:21:51.440Z 7fb18845-fec2-447b-8115-ee1f59912534 _request('GET', 'https://jade.datarepo-dev.broadinstitute.org/api/repository/v1/snapshots', fields={'offset': 0, 'limit': 200}, headers=None, body=None)
2022-01-20 03:21:51.493 [DEBUG] 2022-01-20T03:21:51.492Z 0155897c-9093-4738-bfa3-4bc1804d84d0 _request('GET', 'https://jade.datarepo-dev.broadinstitute.org/api/repository/v1/snapshots', fields={'offset': 0, 'limit': 200}, headers=None, body=None)
2022-01-20 03:21:55.352 (390bec47-4683-4568-b2d0-5864e1495042) Extended Request Id: MOZFDExnIAMFrdA=
2022-01-20 03:21:55.352 {"requestId": "390bec47-4683-4568-b2d0-5864e1495042", "ip": "52.20.215.168", "caller": "-", "user": "-", "requestTime": "20/Jan/2022:03:21:55 +0000", "httpMethod": "GET", "resourcePath": "/manifest/files", "status": "504", "protocol": "HTTP/1.1", "responseLength": "41"}
2022-01-20 03:21:55.353 (390bec47-4683-4568-b2d0-5864e1495042) Verifying Usage Plan for request: 390bec47-4683-4568-b2d0-5864e1495042. API Key: API Stage: ann5yskrli/dev
2022-01-20 03:21:55.354 (390bec47-4683-4568-b2d0-5864e1495042) API Key authorized because method 'GET /manifest/files' does not require API Key. Request will not contribute to throttle or quota limits
2022-01-20 03:21:55.354 (390bec47-4683-4568-b2d0-5864e1495042) Usage Plan check succeeded for API Key and API Stage ann5yskrli/dev
2022-01-20 03:21:55.354 (390bec47-4683-4568-b2d0-5864e1495042) Starting execution for request: 390bec47-4683-4568-b2d0-5864e1495042
2022-01-20 03:21:55.354 (390bec47-4683-4568-b2d0-5864e1495042) HTTP Method: GET, Resource Path: /manifest/files
2022-01-20 03:22:21.959 END RequestId: 7fb18845-fec2-447b-8115-ee1f59912534
2022-01-20 03:22:21.959 REPORT RequestId: 7fb18845-fec2-447b-8115-ee1f59912534 Duration: 31031.53 ms Billed Duration: 31000 ms Memory Size: 2048 MB Max Memory Used: 103 MB
2022-01-20 03:22:21.959 2022-01-20T03:22:21.959Z 7fb18845-fec2-447b-8115-ee1f59912534 Task timed out after 31.03 seconds

2022-01-20 03:22:21.995 END RequestId: 0155897c-9093-4738-bfa3-4bc1804d84d0
2022-01-20 03:22:21.995 REPORT RequestId: 0155897c-9093-4738-bfa3-4bc1804d84d0 Duration: 31031.50 ms Billed Duration: 31000 ms Memory Size: 2048 MB Max Memory Used: 103 MB
2022-01-20 03:22:21.995 2022-01-20T03:22:21.995Z 0155897c-9093-4738-bfa3-4bc1804d84d0 Task timed out after 31.03 seconds

2022-01-20 03:22:24.355 (390bec47-4683-4568-b2d0-5864e1495042) Execution failed due to a timeout error
2022-01-20 03:22:24.355 (390bec47-4683-4568-b2d0-5864e1495042) Method completed with status: 504
melainalegaspi commented 2 years ago

@hannes-ucsc : "Looks good. Since TDR is the culprit, #3910 will workaround that."