DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Transient IT failue for [catalog_complete] sub-test in GitLab prod #6335

Closed achave11-ucsc closed 3 months ago

achave11-ucsc commented 5 months ago

For short reproduction, see https://github.com/DataBiosphere/azul/issues/6335#issuecomment-2190150632. EDIT: @achave11-ucsc


https://gitlab.azul.data.humancellatlas.org/ucsc/azul/-/jobs/59809/raw

With sub-test run logs …

2024-06-13 02:24:48,805    INFO MainThread azul.oauth2: Making GET request to 'https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp38-it&size=100&filters=%7B%7D'
2024-06-13 02:24:48,805   DEBUG MainThread azul.oauth2: … without request body
2024-06-13 02:25:45,950    INFO MainThread test.integration_test: Failed sub-test [catalog_complete] {'catalog': 'dcp38-it'}

  test_indexing (integration_test.IndexingIntegrationTest.test_indexing) [catalog_complete] (catalog='dcp38-it') ... ERROR

… and test error logs …

======================================================================
ERROR: test_indexing (integration_test.IndexingIntegrationTest.test_indexing) [catalog_complete] (catalog='dcp38-it')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/builds/ucsc/azul/test/integration_test.py", line 447, in subTest
    yield
  File "/builds/ucsc/azul/test/integration_test.py", line 1348, in _assert_catalog_complete
    indexed_fqids = self._get_indexed_bundles(catalog)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/test/integration_test.py", line 1303, in _get_indexed_bundles
    hits = self._get_entities(catalog, 'bundles', filters)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/test/integration_test.py", line 1401, in _get_entities
    body = self._get_url_json(GET, url)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/test/integration_test.py", line 872, in _get_url_json
    return json.loads(self._get_url_content(method, url))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/test/integration_test.py", line 876, in _get_url_content
    response = self._get_url(method, url)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/test/integration_test.py", line 893, in _get_url
    response = self._get_url_unchecked(method, url, stream=stream)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/test/integration_test.py", line 921, in _get_url_unchecked
    response = http.request(method=method,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/request.py", line 77, in request
    return self.request_encode_url(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/request.py", line 99, in request_encode_url
    return self.urlopen(method, url, **extra_kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/google/auth/transport/urllib3.py", line 385, in urlopen
    response = self.http.urlopen(
               ^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/src/azul/http.py", line 42, in urlopen
    return self._inner.urlopen(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/src/azul/http.py", line 68, in urlopen
    response = super().urlopen(method, url, body=body, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builds/ucsc/azul/src/azul/http.py", line 42, in urlopen
    return self._inner.urlopen(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/poolmanager.py", line 376, in urlopen
    response = conn.urlopen(method, u.request_uri, **kw)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 894, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 894, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 894, in urlopen
    return self.urlopen(
           ^^^^^^^^^^^^^
  [Previous line repeated 2 more times]
  File "/build/.venv/lib/python3.11/site-packages/urllib3/connectionpool.py", line 884, in urlopen
    retries = retries.increment(method, url, response=response, _pool=self)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/build/.venv/lib/python3.11/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='service.azul.data.humancellatlas.org', port=443): Max retries exceeded with url: /index/bundles?catalog=dcp38-it&size=100&filters=%7B%7D (Caused by ResponseError('too many 502 error responses'))
----------------------------------------------------------------------
Ran 17 tests in 2070.452s
FAILED (errors=1, skipped=4)
make: *** [Makefile:247: integration_test] Error 1
Cleaning up project directory and file based variables 00:00
ERROR: Job failed: exit code 1

These are the relevant service logs, note that the requestId filter was determined by matching the endpoint and catalog filters of the failed requests. CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-prod
start-time: 2024-06-13T02:24:54.000Z
end-time: 2024-06-13T02:25:54.000Z
query-string:

  fields @timestamp, @message
| filter @message like /Error|Returning|Received/
| filter @message like /ce13565b|d497a0d6|b3863175|02f7fac7|87e2ca03/
| sort @timestamp asc

@timestamp @message
2024-06-13 02:24:58.899 [INFO] 2024-06-13T02:24:58.899Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "fkJYgG1WxMI-4q0xo-Rhf6e_4td1sFUf8hCNSpzkHw_O8LxgfrvSzw==", "x-amzn-trace-id": "Root=1-666a587a-09380c726fbd06086f53f4b9", "x-forwarded-for": "52.3.190.111, 18.68.30.164", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}.
2024-06-13 02:25:07.910 [DEBUG] 2024-06-13T02:25:07.910Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c
2024-06-13 02:25:08.439 [INFO] 2024-06-13T02:25:08.439Z d497a0d6-0586-4bcb-80ff-3c5da318be50 azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "iRDkRMKnq4fccQtgJQy-5EtrPwVeKloHnImpfecP428_aRy9Fn76ow==", "x-amzn-trace-id": "Root=1-666a5884-7a43d64877a93ea52cdfb9a3", "x-forwarded-for": "52.3.190.111, 18.68.30.172", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}.
2024-06-13 02:25:08.978 RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Error: Runtime exited with error: exit status 1 Runtime.ExitError
2024-06-13 02:25:17.151 [DEBUG] 2024-06-13T02:25:17.151Z d497a0d6-0586-4bcb-80ff-3c5da318be50 azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c
2024-06-13 02:25:17.610 [INFO] 2024-06-13T02:25:17.610Z b3863175-6c96-43b8-b239-f0513ec76346 azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "9Rjx-bn0c0XlA33VEuWRZUcbeUPYEFYO_cPA0LIb6uof8mtvbjFsyQ==", "x-amzn-trace-id": "Root=1-666a588d-7a250ab90fc46dcc7046d7b2", "x-forwarded-for": "52.3.190.111, 18.68.30.142", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}.
2024-06-13 02:25:18.076 RequestId: d497a0d6-0586-4bcb-80ff-3c5da318be50 Error: Runtime exited with error: exit status 1 Runtime.ExitError
2024-06-13 02:25:26.800 [DEBUG] 2024-06-13T02:25:26.800Z b3863175-6c96-43b8-b239-f0513ec76346 azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c
2024-06-13 02:25:27.256 [INFO] 2024-06-13T02:25:27.256Z 02f7fac7-c91f-4b19-9f5b-4271304bb865 azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "T_qCl_bEXP7uFiKr9MZQJ0YvnqnJLOOSCvBIk0azd_jFUvSc6aLm9A==", "x-amzn-trace-id": "Root=1-666a5897-7f1027dc7a285cb0584807fe", "x-forwarded-for": "52.3.190.111, 18.68.30.148", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}.
2024-06-13 02:25:27.725 RequestId: b3863175-6c96-43b8-b239-f0513ec76346 Error: Runtime exited with error: exit status 1 Runtime.ExitError
2024-06-13 02:25:36.052 [DEBUG] 2024-06-13T02:25:36.051Z 02f7fac7-c91f-4b19-9f5b-4271304bb865 azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c
2024-06-13 02:25:36.504 [INFO] 2024-06-13T02:25:36.504Z 87e2ca03-47ac-48cd-9fc5-a86232d68137 azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "Jfzq0eZKR_T548wxigjG9l95SaT1tGdk5c8g_lkAMTnSDHaoGVJoCQ==", "x-amzn-trace-id": "Root=1-666a58a0-4165a86c1a592b76186403ee", "x-forwarded-for": "52.3.190.111, 18.68.30.149", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}.
2024-06-13 02:25:36.952 RequestId: 02f7fac7-c91f-4b19-9f5b-4271304bb865 Error: Runtime exited with error: exit status 1 Runtime.ExitError
2024-06-13 02:25:45.515 [DEBUG] 2024-06-13T02:25:45.515Z 87e2ca03-47ac-48cd-9fc5-a86232d68137 azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c
2024-06-13 02:25:46.415 RequestId: 87e2ca03-47ac-48cd-9fc5-a86232d68137 Error: Runtime exited with error: exit status 1 Runtime.ExitError

achave11-ucsc commented 5 months ago

Assignee to provide exhaustive logs for the first lambda invocation.

achave11-ucsc commented 5 months ago

The following are the exhaustive logs for the Lambda invocation matching a @requestId like "ce13565b", but using the corresponding @logsStream as a filter and narrowed down the time range to capture all the event logs for the invocation, including the logs without a requestId.

CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-prod
start-time: 2024-06-13T02:24:57.490Z
end-time: 2024-06-13T02:25:10.026Z
query-string:

  fields @timestamp, @message
# | filter @requestId like /ce13565b/
| filter @logStream = "2024/06/13/[$LATEST]c6291b4f54884e80a82b8f363ecc7733"
| sort @timestamp asc
| limit 1000

@timestamp @message
2024-06-13 02:24:58.898 START RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Version: $LATEST
2024-06-13 02:24:58.899 [INFO] 2024-06-13T02:24:58.899Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "fkJYgG1WxMI-4q0xo-Rhf6e_4td1sFUf8hCNSpzkHw_O8LxgfrvSzw==", "x-amzn-trace-id": "Root=1-666a587a-09380c726fbd06086f53f4b9", "x-forwarded-for": "52.3.190.111, 18.68.30.164", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}.
2024-06-13 02:24:58.899 [INFO] 2024-06-13T02:24:58.899Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Authenticated request as OAuth2(access_token='ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op')
2024-06-13 02:24:58.925 [INFO] 2024-06-13T02:24:58.925Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: Making POST request to https://dynamodb.us-east-1.amazonaws.com/
2024-06-13 02:24:58.926 [INFO] 2024-06-13T02:24:58.926Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: … with request body b'{"TableName": "azul-sources_cache_by_auth-prod", "Key": {"identity": {"S": "dcp38-it:ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0...'
2024-06-13 02:24:58.932 [INFO] 2024-06-13T02:24:58.932Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: Got 200 response
2024-06-13 02:24:58.932 [INFO] 2024-06-13T02:24:58.932Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: … with response body b'{"Item":{"expiration":{"N":"1718245549"},"sources":{"S":"[\"24776f45-5584-4456-87d3-71bcea97cd54\", \"2efd52e7-0096-4a0c-9b0a...'
2024-06-13 02:24:58.977 [INFO] 2024-06-13T02:24:58.977Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch Making POST request to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp38-it_bundles_aggregate/_search
2024-06-13 02:24:58.977 [INFO] 2024-06-13T02:24:58.977Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch … with request body b'{"post_filter":{"bool":{"must":[{"constant_score":{"filter":{"terms":{"sources.id.keyword":["004ee0a4-7672-4922-9ec2-9f1c4e1f...'
2024-06-13 02:24:59.439 [INFO] 2024-06-13T02:24:59.439Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch Got 200 response after 0.435s from POST to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp38-it_bundles_aggregate/_search
2024-06-13 02:24:59.439 [INFO] 2024-06-13T02:24:59.439Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch … with response body '{"took":180,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":69,"relation…'
2024-06-13 02:25:07.910 [DEBUG] 2024-06-13T02:25:07.910Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c
2024-06-13 02:25:08.413 [ERROR] [1718245508413] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413.
2024-06-13 02:25:08.414 Traceback (most recent call last):
2024-06-13 02:25:08.414 File "/var/runtime/bootstrap.py", line 63, in
2024-06-13 02:25:08.414 main()
2024-06-13 02:25:08.414 File "/var/runtime/bootstrap.py", line 60, in main
2024-06-13 02:25:08.414 awslambdaricmain.main([os.environ["LAMBDA_TASK_ROOT"], os.environ["_HANDLER"]])
2024-06-13 02:25:08.414 File "/var/lang/lib/python3.11/site-packages/awslambdaric/main.py", line 21, in main
2024-06-13 02:25:08.414 bootstrap.run(app_root, handler, lambda_runtime_api_addr)
2024-06-13 02:25:08.414 File "/var/lang/lib/python3.11/site-packages/awslambdaric/bootstrap.py", line 499, in run
2024-06-13 02:25:08.414 handle_event_request(
2024-06-13 02:25:08.414 File "/var/lang/lib/python3.11/site-packages/awslambdaric/bootstrap.py", line 223, in handle_event_request
2024-06-13 02:25:08.414 lambda_runtime_client.post_invocation_result(
2024-06-13 02:25:08.414 File "/var/lang/lib/python3.11/site-packages/awslambdaric/lambda_runtime_client.py", line 109, in post_invocation_result
2024-06-13 02:25:08.415 runtime_client.post_invocation_result(
2024-06-13 02:25:08.415 RuntimeError: Failed to post invocation response
2024-06-13 02:25:08.978 RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Error: Runtime exited with error: exit status 1 Runtime.ExitError
2024-06-13 02:25:08.978 END RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde
2024-06-13 02:25:08.978 REPORT RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Duration: 10079.62 ms Billed Duration: 10080 ms Memory Size: 2048 MB Max Memory Used: 390 MB
2024-06-13 02:25:09.012 INIT_START Runtime Version: python:3.11.v37 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:7922becca3524b6503b11e59f33405b21aca377582c11bd54243092ba71b57d5

achave11-ucsc commented 5 months ago

Assignee to open support ticket with AWS Support.

achave11-ucsc commented 5 months ago

Support ticket was created, https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=171839380500948&language=en.

achave11-ucsc commented 5 months ago

Spike to attempt to reproduce by pinning the IT seed. If it does reproduce that way it would indicate that it is provably the size or some other property of the response.

hannes-ucsc commented 5 months ago

The API Gateway logs might also contain clues. Before any attempts to reproduce, the assignee should look at those as part of the current spike.

achave11-ucsc commented 4 months ago

As you suspected, @hannes-ucsc, the response content size was too large and it's evident from the API Gateway execution logs (third line from the bottom up). This API Gateway execution logs are the ones associated with the requesteId ce13565b-391f-4c81-b19c-32ea8d532dde, for which I provided Service Lambda execution logs in a previous coment.

CloudWatch Logs Insights
region: us-east-1
log-group-names: API-Gateway-Execution-Logs_ccp122p730/prod
start-time: 2024-06-13T02:24:57.490Z
end-time: 2024-06-13T02:25:09.516Z
query-string:

  fields @timestamp, @message
| filter @message like "9802348c"
# | filter @logStream = "2024/06/13/[$LATEST]c6291b4f54884e80a82b8f363ecc7733"
| sort @timestamp asc
| limit 1000

@timestamp @message
2024-06-13 02:24:58.878 (9802348c-0341-4aca-805e-a57b83260220) Extended Request Id: ZSLDPGgrIAMESjQ=
2024-06-13 02:24:58.885 (9802348c-0341-4aca-805e-a57b83260220) Verifying Usage Plan for request: 9802348c-0341-4aca-805e-a57b83260220. API Key: API Stage: ccp122p730/prod
2024-06-13 02:24:58.886 (9802348c-0341-4aca-805e-a57b83260220) API Key authorized because method 'GET /index/{entity_type}' does not require API Key. Request will not contribute to throttle or quota limits
2024-06-13 02:24:58.886 (9802348c-0341-4aca-805e-a57b83260220) Usage Plan check succeeded for API Key and API Stage ccp122p730/prod
2024-06-13 02:24:58.886 (9802348c-0341-4aca-805e-a57b83260220) Starting execution for request: 9802348c-0341-4aca-805e-a57b83260220
2024-06-13 02:24:58.886 (9802348c-0341-4aca-805e-a57b83260220) HTTP Method: GET, Resource Path: /index/bundles
2024-06-13 02:25:08.417 (9802348c-0341-4aca-805e-a57b83260220) Lambda execution failed with status 200 due to customer function error: Response payload size exceeded maximum allowed payload size (6291556 bytes).. Lambda request id: ce13565b-391f-4c81-b19c-32ea8d532dde
2024-06-13 02:25:08.417 (9802348c-0341-4aca-805e-a57b83260220) Method completed with status: 502
2024-06-13 02:25:08.417 (9802348c-0341-4aca-805e-a57b83260220) AWS Integration Endpoint RequestId : ce13565b-391f-4c81-b19c-32ea8d532dde

achave11-ucsc commented 4 months ago

Assignee to close support ticket with Amazon and spike to reproduce and potentially propose a solution (what makes the response so big and how can we make it smaller).

achave11-ucsc commented 4 months ago

Self-note: Add reproduction with filtered request, and short term solution make IT request page size of 10 for the failing entity_type.

@hannes-ucsc: "Long term solution might be to dynamically shrink oversize responses until they pass the API Gateway limit. Note that compression plays into this #6360. Another trivial solution would be to constrain the number of files in each bundle hit. Lastly there are 67 bundles in this project and at least the first two are identical, it seems the project is malformed and fixing it would also resolve this issue without any changes to Azul."

achave11-ucsc commented 4 months ago

To reproduce the oversized response service-error execution for prod run the following curl command:

curl -X 'GET' \
  'https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp39&filters=%7B%0A%20%20%22sourceSpec%22%3A%20%7B%0A%20%20%20%20%22is%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%22tdr%3Adatarepo-664a24cb%3Asnapshot%2Fhca_prod_7c75f07c608d4c4aa1b7b13d11c0ad31__20220117_dcp2_20230314_dcp25%3A%2F0%22%0A%20%20%20%20%5D%0A%20%20%7D%0A%7D&size=30' \
  -H 'accept: application/json'
achave11-ucsc commented 4 months ago

Short term solution, reduce the page size to 10 for bundles requests in the IT:

diff --git a/test/integration_test.py b/test/integration_test.py
index 53769095..cad8c1bc 100644
--- a/test/integration_test.py
+++ b/test/integration_test.py
@@ -1391,7 +1391,11 @@ class IndexingIntegrationTest(IntegrationTestCase, AlwaysTearDownTestCase):
                       filters: Optional[JSON] = None
                       ) -> MutableJSONs:
         entities = []
-        size = 100
+        # A project (7c75f07c) may list all of its files in each of its bundles
+        # which can cause an oversize response if the page size is sufficiently
+        # large. In the interim this type of failure is prevented by using a
+        # smaller page size for `bundles`.
+        size = 10 if entity_type == 'bundles' else 100
         params = dict(catalog=catalog,
                       size=str(size),
                       filters=json.dumps(filters if filters else {}))
hannes-ucsc commented 4 months ago

This seems to be fairly rare.

A note on the dynamic shrinking of the hits: it would require adjusting the pagination part of the response, in particular the next link and the searchAfter parameter embedded in it. I believe the pagination part is determined before the ES hits are transformed to the response hits, so the response size is not yet known when the pagination is determined.

Spike again to get the links.json files for two nearly identical bundles in that project so that we can go back to the wranglers and complain.

achave11-ucsc commented 4 months ago

Attaching the requested links.json files for two nearly identical bundles, the following were obtained from BigQuery in the platform-hca-prod Google Cloud project using the following BQ query:

SELECT * FROM `datarepo-664a24cb.hca_prod_7c75f07c608d4c4aa1b7b13d11c0ad31__20220117_dcp2_20230314_dcp25.links` WHERE links_id IN ('be8dd2ba-817c-48c6-b1fb-6503221b3c21', 'b48217eb-1bba-463f-b27e-bb7c8f8dde21')

bundle-b48217eb-1bba-463f-b27e-bb7c8f8dde21.links.json bundle-be8dd2ba-817c-48c6-b1fb-6503221b3c21.links.json

Also, this very project is the subject of https://github.com/ebi-ait/hca-ebi-wrangler-central/issues/314, we may want to consider this when pinging the appropriate wranglers.

dsotirho-ucsc commented 4 months ago

Assignee to consider next steps.

hannes-ucsc commented 4 months ago
image
hannes-ucsc commented 4 months ago

The above screenshot is a condensed view of one of the two example bundles aka subgraphs. Both bundles have the same shape (each ⟹ is a process).

((donor ⟹ specimen ⟹ cell suspension ⟹ four sequence files) + 402 other sequence files) ⟹ analysis_file

While the first three processes have the same shape, the specific instances of the involved entities are different. The last process is identical in both bundles (same input and output instances).

There are 60+ bundles like this in the project so the degree of subgraph overlap is significant. The 402 other sequence files that are used as inputs to the last process are not part of the bundle so our subgraph stitching algorithm must have stitched on the bundles that produced them, otherwise the undefined inputs would have prevented successful indexing of that bundle.

I'd like to hear from the wranglers if this is intentional and why.

I'd also like to understand why the last process references library prep and sequencing protocols even though it has only sequence files as inputs, so sequencing and library prep are already done and should only be associated with the respective processes earlier on.

Assignee to determine the complete project ID and whether subgraph stitching was indeed used during indexing.

achave11-ucsc commented 4 months ago

Project ID: 7c75f07c-608d-4c4a-a1b7-b13d11c0ad31

Subgraph stitching was indeed used as part of the most recent reindex in prod. Note that there are 61 other pairs of similar logs for each of the remaining, successfully indexed, bundle contributions of this project.

CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-indexer-prod-contribute
start-time: 2024-06-21T10:03:10.568Z
end-time: 2024-06-21T15:47:46.121Z
query-string:

  fields @timestamp, @log, @message, notification.bundle_fqid.uuid as successful_bundle_indexed
| filter @message like /Stitched \d+. bundle|Worker successfully handled message/
| parse @message 'Stitched * bundle(s' as num_of_stitched_bundles
| parse @message 'are * dangling' as dangling_inputs
| parse @requestId '*-' as rId
| filter @requestId like /b4a0b540|5bd4f57a|62495c08|e9661314|4a0634d9|bfaae510|17055822|6ccd78de|baaf8458|07f709ed|521ddce3|8f705e2f|4fbc1090|65080a65|27d7ba55|136d2d0f|141dbcda|3a56eb1f|77ce3609|3e8012a9|20b808d7|8dc5056b|3da774f0|4eb7dc94|cd2bcb6c|a4fa3e4d|c74f3573|53bf45ad|6d3f8a14|6a484158|df0eaaa8|5c83663f|f37ab796|36460c0c|b3a8b8c2|049c0a40|7d0cd38d|8137fb10|6c0cd38a|61c4264e|4d3b3de6|8ee782d9|06a525d4|966fc433|85824392|6e89e187|9e71cbda|ac1eaedb|3a43eec7|158da1d5|845ef51d|87b0e1df|394c34c7|0e79946d|c085f6cc|0aabefb2|ad90ba3b|e593c6d7|411353fc|57649419|b83c181d|5865cc9d|eb7b2867|3c08955a|d1debfdf|f14436d6/
| display successful_bundle_indexed, @requestId, num_of_stitched_bundles
| limit 10
| sort rId asc, @timestamp desc

successful_bundle_indexed @requestId num_of_stitched_bundles
5557be15-acc4-457e-b6fa-45902f9ca9a8 0aabefb2-a78f-5508-aeea-b15dc3a472d3
0aabefb2-a78f-5508-aeea-b15dc3a472d3 65
be69a78e-5861-46c6-b7e5-f379b48fdcd8 0e79946d-ae64-5340-9e24-ca82ca280c19
0e79946d-ae64-5340-9e24-ca82ca280c19 65
89bd6bff-a5fb-41f3-a987-b65740c46b93 3a43eec7-c93a-543a-acd4-b028af7b6b87
3a43eec7-c93a-543a-acd4-b028af7b6b87 65
0527a1b6-ec32-4b21-bfaf-1be6eaadbdf3 3a56eb1f-9dd7-5ed8-a65d-b5f004e5d11b
3a56eb1f-9dd7-5ed8-a65d-b5f004e5d11b 65
1a80dde0-c441-4c10-ab40-e8914de3fbe8 3c08955a-51f3-57de-8b12-cbb531c91d3b
3c08955a-51f3-57de-8b12-cbb531c91d3b 65

The requestIds were obtained from the logs matching the strings Worker successfully handled message and the project snapshot name, hca_prod_7c75f07c608d4c4aa1b7b13d11c0ad31__20220117_dcp2_20230314_dcp25.

StitchedSubgraphs
achave11-ucsc commented 4 months ago

The previous logs from indexer-prod-contribute are from a targeted reindex of the dcp39 catalog, the provided reproduction originally used the dcp38 catalog, but the problem is still reproducible in dcp39, so the reproduction was updated.

dsotirho-ucsc commented 4 months ago

Assignee to consider next steps.

hannes-ucsc commented 4 months ago

Started Slack thread: https://humancellatlas.slack.com/archives/C01360XN04S/p1720466181645819

achave11-ucsc commented 4 months ago

@hannes-ucsc: "On Slack wranglers expressed openness to fixing the sub-graph structure, but in a way that seemed to not correctly reflect the provenance of the analysis file (skipping sequence files). I've raised that concern but have not heard back. Because of the admission that the snapshot is defective or could be improved we'll simply remove the snapshot. Assignee to file PR for that."