Closed achave11-ucsc closed 3 months ago
Assignee to provide exhaustive logs for the first lambda invocation.
The following are the exhaustive logs for the Lambda invocation matching a @requestId like "ce13565b"
, but using the corresponding @logsStream
as a filter and narrowed down the time range to capture all the event logs for the invocation, including the logs without a requestId.
CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-prod
start-time: 2024-06-13T02:24:57.490Z
end-time: 2024-06-13T02:25:10.026Z
query-string:
fields @timestamp, @message
# | filter @requestId like /ce13565b/
| filter @logStream = "2024/06/13/[$LATEST]c6291b4f54884e80a82b8f363ecc7733"
| sort @timestamp asc
| limit 1000
@timestamp | @message |
---|---|
2024-06-13 02:24:58.898 | START RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Version: $LATEST |
2024-06-13 02:24:58.899 | [INFO] 2024-06-13T02:24:58.899Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Received GET request for '/index/bundles', with {"query": {"catalog": "dcp38-it", "filters": "{}", "size": "100"}, "headers": {"accept-encoding": "identity", "authorization": "Bearer ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op", "cloudfront-forwarded-proto": "https", "cloudfront-is-desktop-viewer": "true", "cloudfront-is-mobile-viewer": "false", "cloudfront-is-smarttv-viewer": "false", "cloudfront-is-tablet-viewer": "false", "cloudfront-viewer-asn": "14618", "cloudfront-viewer-country": "US", "host": "service.azul.data.humancellatlas.org", "user-agent": "python-urllib3/1.26.18", "via": "1.1 52ac015dacdf6aed9db953cf96e66dda.cloudfront.net (CloudFront)", "x-allowed-locations": "0x0", "x-amz-cf-id": "fkJYgG1WxMI-4q0xo-Rhf6e_4td1sFUf8hCNSpzkHw_O8LxgfrvSzw==", "x-amzn-trace-id": "Root=1-666a587a-09380c726fbd06086f53f4b9", "x-forwarded-for": "52.3.190.111, 18.68.30.164", "x-forwarded-port": "443", "x-forwarded-proto": "https", "x-goog-api-client": "cred-type/sa"}}. |
2024-06-13 02:24:58.899 | [INFO] 2024-06-13T02:24:58.899Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Authenticated request as OAuth2(access_token='ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0oYhgEbUO97lBpJ2YsV42zh5bupRoqCsiKLga5R9B-3oBrAUEXnSRGXy667ZabzX5NI_JxhjkF3RKNg1T53BjIW6BhFkAJlcow47eY_3-FX6iVHlcfGwnzneRIgeqmWCIz6fvurkfuoNKOGIeazFNditxgdaEGZgKM4857j6W1e-v-JSSxTzc2Y0SsPj7btjB-I8vNTjYAbtcokRsLuQeU6FVNnxQk7jpHh0kqE7g-ZBuNhc64jPxsj8mnA0WP43vW3Of83Vcoirg7s0KbwJE-DKtmr0hSWog2ElgKuFLZzl0JvqULUaIAe4odK7AHLctpGHP_4skCNMQe3UiwK92qb19eJT7UnRcN392AeqyXhW3zJubUqYfjWJ9kqo-i8Yz0koU0I5_40o3ZxniRq2Sju2mW79rvOtI3JklYUFvQn55a8oQjeBmX3_Bp5nVQrjob4dSggYIm80jcWURefyZpnUp-VJZk-0f18BF13uZ9epMmxpVJmOaa2tViu3JzI3BmqQ1hpmopvUs1Ry6cIgj5jnOtVhxJ8-OoUxpfqgsxpo_Bj-2riYJJtq3f2t5vftV33f-ewhJW7dQqe4llOSmMB-3moJ1y5sY367ceOyRqjIQnwXb8wypSxmXoW2pszx3R2Xz6u6k31b93IU3I1per-m16hWW9tIJVy0Fei0ykm01_B_vcZkBykYte7-3V0huk7gk_qZYWupbmMmQn1qhkmeQ-QB1Yp3oQBwxe_x0leJd3m1BMuiv6ciX0YUx0lv_25Mv5Z8d66lteidgi2Ow9Mx_lz5eB5rz_QO3_1FomkXik9n6s_0s_t2nn8oXqXj_Fkj8JO9Mjo8ug9F3fiMehxvuSilMkofWfqaWg73s_iJWUXJpSw7na-5QnbbquUV2bZadztF4tpRq0qx8Xjb8WVsW2owz-Jtru6Z5rd3kdQk_FM-9mVSUYqx---9ySkWhjOcn01FguXhxhyM8j4O2p3op') |
2024-06-13 02:24:58.925 | [INFO] 2024-06-13T02:24:58.925Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: Making POST request to https://dynamodb.us-east-1.amazonaws.com/ |
2024-06-13 02:24:58.926 | [INFO] 2024-06-13T02:24:58.926Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: … with request body b'{"TableName": "azul-sources_cache_by_auth-prod", "Key": {"identity": {"S": "dcp38-it:ya29.c.c0AY_VpZhepX0ho0AfhP37qSRjugqz0d0...' |
2024-06-13 02:24:58.932 | [INFO] 2024-06-13T02:24:58.932Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: Got 200 response |
2024-06-13 02:24:58.932 | [INFO] 2024-06-13T02:24:58.932Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.boto3 dynamodb.GetItem: … with response body b'{"Item":{"expiration":{"N":"1718245549"},"sources":{"S":"[\"24776f45-5584-4456-87d3-71bcea97cd54\", \"2efd52e7-0096-4a0c-9b0a...' |
2024-06-13 02:24:58.977 | [INFO] 2024-06-13T02:24:58.977Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch Making POST request to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp38-it_bundles_aggregate/_search |
2024-06-13 02:24:58.977 | [INFO] 2024-06-13T02:24:58.977Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch … with request body b'{"post_filter":{"bool":{"must":[{"constant_score":{"filter":{"terms":{"sources.id.keyword":["004ee0a4-7672-4922-9ec2-9f1c4e1f...' |
2024-06-13 02:24:59.439 | [INFO] 2024-06-13T02:24:59.439Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch Got 200 response after 0.435s from POST to https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp38-it_bundles_aggregate/_search |
2024-06-13 02:24:59.439 | [INFO] 2024-06-13T02:24:59.439Z ce13565b-391f-4c81-b19c-32ea8d532dde elasticsearch … with response body '{"took":180,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":69,"relation…' |
2024-06-13 02:25:07.910 | [DEBUG] 2024-06-13T02:25:07.910Z ce13565b-391f-4c81-b19c-32ea8d532dde azul.chalice Returning 200 response with headers {"Access-Control-Allow-Origin": "*", "Access-Control-Allow-Headers": "Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key", "Strict-Transport-Security": "max-age=31536000; includeSubDomains", "X-Content-Type-Options": "nosniff", "X-Frame-Options": "DENY", "Cache-Control": "no-store"}. See next line for the first 1024 characters of the body. {"pagination": {"count": 69, "total": 69, "size": 100, "next": null, "previous": null, "pages": 1, "sort": "bundleVersion", "order": "desc"}, "termFacets": {"organ": {"terms": [{"term": "blood", "count": 67}, {"term": "breast", "count": 66}, {"term": "immune system", "count": 66}, {"term": "adipose tissue", "count": 1}, {"term": "bladder organ", "count": 1}, {"term": "bone marrow", "count": 1}, {"term": "eye", "count": 1}, {"term": "heart", "count": 1}, {"term": "kidney", "count": 1}, {"term": "large intestine", "count": 1}, {"term": "liver", "count": 1}, {"term": "lung", "count": 1}, {"term": "lymph node", "count": 1}, {"term": "mammary gland", "count": 1}, {"term": "muscle organ", "count": 1}, {"term": "pancreas", "count": 1}, {"term": "prostate gland", "count": 1}, {"term": "saliva-secreting gland", "count": 1}, {"term": "skin of body", "count": 1}, {"term": "small intestine", "count": 1}, {"term": "spleen", "count": 1}, {"term": "thymus", "count": 1}, {"term": "tongue", "count": 1}, {"term": "trachea", "c |
2024-06-13 02:25:08.413 | [ERROR] [1718245508413] LAMBDA_RUNTIME Failed to post handler success response. Http response code: 413. |
2024-06-13 02:25:08.414 | Traceback (most recent call last): |
2024-06-13 02:25:08.414 | File "/var/runtime/bootstrap.py", line 63, in |
2024-06-13 02:25:08.414 | main() |
2024-06-13 02:25:08.414 | File "/var/runtime/bootstrap.py", line 60, in main |
2024-06-13 02:25:08.414 | awslambdaricmain.main([os.environ["LAMBDA_TASK_ROOT"], os.environ["_HANDLER"]]) |
2024-06-13 02:25:08.414 | File "/var/lang/lib/python3.11/site-packages/awslambdaric/main.py", line 21, in main |
2024-06-13 02:25:08.414 | bootstrap.run(app_root, handler, lambda_runtime_api_addr) |
2024-06-13 02:25:08.414 | File "/var/lang/lib/python3.11/site-packages/awslambdaric/bootstrap.py", line 499, in run |
2024-06-13 02:25:08.414 | handle_event_request( |
2024-06-13 02:25:08.414 | File "/var/lang/lib/python3.11/site-packages/awslambdaric/bootstrap.py", line 223, in handle_event_request |
2024-06-13 02:25:08.414 | lambda_runtime_client.post_invocation_result( |
2024-06-13 02:25:08.414 | File "/var/lang/lib/python3.11/site-packages/awslambdaric/lambda_runtime_client.py", line 109, in post_invocation_result |
2024-06-13 02:25:08.415 | runtime_client.post_invocation_result( |
2024-06-13 02:25:08.415 | RuntimeError: Failed to post invocation response |
2024-06-13 02:25:08.978 | RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Error: Runtime exited with error: exit status 1 Runtime.ExitError |
2024-06-13 02:25:08.978 | END RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde |
2024-06-13 02:25:08.978 | REPORT RequestId: ce13565b-391f-4c81-b19c-32ea8d532dde Duration: 10079.62 ms Billed Duration: 10080 ms Memory Size: 2048 MB Max Memory Used: 390 MB |
2024-06-13 02:25:09.012 | INIT_START Runtime Version: python:3.11.v37 Runtime Version ARN: arn:aws:lambda:us-east-1::runtime:7922becca3524b6503b11e59f33405b21aca377582c11bd54243092ba71b57d5 |
Assignee to open support ticket with AWS Support.
Support ticket was created, https://support.console.aws.amazon.com/support/home?region=us-east-1#/case/?displayId=171839380500948&language=en.
Spike to attempt to reproduce by pinning the IT seed. If it does reproduce that way it would indicate that it is provably the size or some other property of the response.
The API Gateway logs might also contain clues. Before any attempts to reproduce, the assignee should look at those as part of the current spike.
As you suspected, @hannes-ucsc, the response content size was too large and it's evident from the API Gateway execution logs (third line from the bottom up). This API Gateway execution logs are the ones associated with the requesteId ce13565b-391f-4c81-b19c-32ea8d532dde
, for which I provided Service Lambda execution logs in a previous coment.
CloudWatch Logs Insights
region: us-east-1
log-group-names: API-Gateway-Execution-Logs_ccp122p730/prod
start-time: 2024-06-13T02:24:57.490Z
end-time: 2024-06-13T02:25:09.516Z
query-string:
fields @timestamp, @message
| filter @message like "9802348c"
# | filter @logStream = "2024/06/13/[$LATEST]c6291b4f54884e80a82b8f363ecc7733"
| sort @timestamp asc
| limit 1000
@timestamp | @message |
---|---|
2024-06-13 02:24:58.878 | (9802348c-0341-4aca-805e-a57b83260220) Extended Request Id: ZSLDPGgrIAMESjQ= |
2024-06-13 02:24:58.885 | (9802348c-0341-4aca-805e-a57b83260220) Verifying Usage Plan for request: 9802348c-0341-4aca-805e-a57b83260220. API Key: API Stage: ccp122p730/prod |
2024-06-13 02:24:58.886 | (9802348c-0341-4aca-805e-a57b83260220) API Key authorized because method 'GET /index/{entity_type}' does not require API Key. Request will not contribute to throttle or quota limits |
2024-06-13 02:24:58.886 | (9802348c-0341-4aca-805e-a57b83260220) Usage Plan check succeeded for API Key and API Stage ccp122p730/prod |
2024-06-13 02:24:58.886 | (9802348c-0341-4aca-805e-a57b83260220) Starting execution for request: 9802348c-0341-4aca-805e-a57b83260220 |
2024-06-13 02:24:58.886 | (9802348c-0341-4aca-805e-a57b83260220) HTTP Method: GET, Resource Path: /index/bundles |
2024-06-13 02:25:08.417 | (9802348c-0341-4aca-805e-a57b83260220) Lambda execution failed with status 200 due to customer function error: Response payload size exceeded maximum allowed payload size (6291556 bytes).. Lambda request id: ce13565b-391f-4c81-b19c-32ea8d532dde |
2024-06-13 02:25:08.417 | (9802348c-0341-4aca-805e-a57b83260220) Method completed with status: 502 |
2024-06-13 02:25:08.417 | (9802348c-0341-4aca-805e-a57b83260220) AWS Integration Endpoint RequestId : ce13565b-391f-4c81-b19c-32ea8d532dde |
Assignee to close support ticket with Amazon and spike to reproduce and potentially propose a solution (what makes the response so big and how can we make it smaller).
Self-note: Add reproduction with filtered request, and short term solution make IT request page size of 10 for the failing entity_type.
@hannes-ucsc: "Long term solution might be to dynamically shrink oversize responses until they pass the API Gateway limit. Note that compression plays into this #6360. Another trivial solution would be to constrain the number of files in each bundle hit. Lastly there are 67 bundles in this project and at least the first two are identical, it seems the project is malformed and fixing it would also resolve this issue without any changes to Azul."
To reproduce the oversized response service-error execution for prod
run the following curl
command:
curl -X 'GET' \
'https://service.azul.data.humancellatlas.org/index/bundles?catalog=dcp39&filters=%7B%0A%20%20%22sourceSpec%22%3A%20%7B%0A%20%20%20%20%22is%22%3A%20%5B%0A%20%20%20%20%20%20%20%20%22tdr%3Adatarepo-664a24cb%3Asnapshot%2Fhca_prod_7c75f07c608d4c4aa1b7b13d11c0ad31__20220117_dcp2_20230314_dcp25%3A%2F0%22%0A%20%20%20%20%5D%0A%20%20%7D%0A%7D&size=30' \
-H 'accept: application/json'
Short term solution, reduce the page size to 10 for bundles
requests in the IT:
diff --git a/test/integration_test.py b/test/integration_test.py
index 53769095..cad8c1bc 100644
--- a/test/integration_test.py
+++ b/test/integration_test.py
@@ -1391,7 +1391,11 @@ class IndexingIntegrationTest(IntegrationTestCase, AlwaysTearDownTestCase):
filters: Optional[JSON] = None
) -> MutableJSONs:
entities = []
- size = 100
+ # A project (7c75f07c) may list all of its files in each of its bundles
+ # which can cause an oversize response if the page size is sufficiently
+ # large. In the interim this type of failure is prevented by using a
+ # smaller page size for `bundles`.
+ size = 10 if entity_type == 'bundles' else 100
params = dict(catalog=catalog,
size=str(size),
filters=json.dumps(filters if filters else {}))
This seems to be fairly rare.
A note on the dynamic shrinking of the hits: it would require adjusting the pagination part of the response, in particular the next link and the searchAfter
parameter embedded in it. I believe the pagination part is determined before the ES hits are transformed to the response hits, so the response size is not yet known when the pagination is determined.
Spike again to get the links.json
files for two nearly identical bundles in that project so that we can go back to the wranglers and complain.
Attaching the requested links.json
files for two nearly identical bundles, the following were obtained from BigQuery in the platform-hca-prod
Google Cloud project using the following BQ query:
SELECT * FROM `datarepo-664a24cb.hca_prod_7c75f07c608d4c4aa1b7b13d11c0ad31__20220117_dcp2_20230314_dcp25.links` WHERE links_id IN ('be8dd2ba-817c-48c6-b1fb-6503221b3c21', 'b48217eb-1bba-463f-b27e-bb7c8f8dde21')
bundle-b48217eb-1bba-463f-b27e-bb7c8f8dde21.links.json bundle-be8dd2ba-817c-48c6-b1fb-6503221b3c21.links.json
Also, this very project is the subject of https://github.com/ebi-ait/hca-ebi-wrangler-central/issues/314, we may want to consider this when pinging the appropriate wranglers.
Assignee to consider next steps.
The above screenshot is a condensed view of one of the two example bundles aka subgraphs. Both bundles have the same shape (each ⟹ is a process).
((donor ⟹ specimen ⟹ cell suspension ⟹ four sequence files) + 402 other sequence files) ⟹ analysis_file
While the first three processes have the same shape, the specific instances of the involved entities are different. The last process is identical in both bundles (same input and output instances).
There are 60+ bundles like this in the project so the degree of subgraph overlap is significant. The 402 other sequence files that are used as inputs to the last process are not part of the bundle so our subgraph stitching algorithm must have stitched on the bundles that produced them, otherwise the undefined inputs would have prevented successful indexing of that bundle.
I'd like to hear from the wranglers if this is intentional and why.
I'd also like to understand why the last process references library prep and sequencing protocols even though it has only sequence files as inputs, so sequencing and library prep are already done and should only be associated with the respective processes earlier on.
Assignee to determine the complete project ID and whether subgraph stitching was indeed used during indexing.
Project ID: 7c75f07c-608d-4c4a-a1b7-b13d11c0ad31
Subgraph stitching was indeed used as part of the most recent reindex in prod
. Note that there are 61 other pairs of similar logs for each of the remaining, successfully indexed, bundle contributions of this project.
CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-indexer-prod-contribute
start-time: 2024-06-21T10:03:10.568Z
end-time: 2024-06-21T15:47:46.121Z
query-string:
fields @timestamp, @log, @message, notification.bundle_fqid.uuid as successful_bundle_indexed
| filter @message like /Stitched \d+. bundle|Worker successfully handled message/
| parse @message 'Stitched * bundle(s' as num_of_stitched_bundles
| parse @message 'are * dangling' as dangling_inputs
| parse @requestId '*-' as rId
| filter @requestId like /b4a0b540|5bd4f57a|62495c08|e9661314|4a0634d9|bfaae510|17055822|6ccd78de|baaf8458|07f709ed|521ddce3|8f705e2f|4fbc1090|65080a65|27d7ba55|136d2d0f|141dbcda|3a56eb1f|77ce3609|3e8012a9|20b808d7|8dc5056b|3da774f0|4eb7dc94|cd2bcb6c|a4fa3e4d|c74f3573|53bf45ad|6d3f8a14|6a484158|df0eaaa8|5c83663f|f37ab796|36460c0c|b3a8b8c2|049c0a40|7d0cd38d|8137fb10|6c0cd38a|61c4264e|4d3b3de6|8ee782d9|06a525d4|966fc433|85824392|6e89e187|9e71cbda|ac1eaedb|3a43eec7|158da1d5|845ef51d|87b0e1df|394c34c7|0e79946d|c085f6cc|0aabefb2|ad90ba3b|e593c6d7|411353fc|57649419|b83c181d|5865cc9d|eb7b2867|3c08955a|d1debfdf|f14436d6/
| display successful_bundle_indexed, @requestId, num_of_stitched_bundles
| limit 10
| sort rId asc, @timestamp desc
successful_bundle_indexed | @requestId | num_of_stitched_bundles |
---|---|---|
5557be15-acc4-457e-b6fa-45902f9ca9a8 | 0aabefb2-a78f-5508-aeea-b15dc3a472d3 | |
0aabefb2-a78f-5508-aeea-b15dc3a472d3 | 65 | |
be69a78e-5861-46c6-b7e5-f379b48fdcd8 | 0e79946d-ae64-5340-9e24-ca82ca280c19 | |
0e79946d-ae64-5340-9e24-ca82ca280c19 | 65 | |
89bd6bff-a5fb-41f3-a987-b65740c46b93 | 3a43eec7-c93a-543a-acd4-b028af7b6b87 | |
3a43eec7-c93a-543a-acd4-b028af7b6b87 | 65 | |
0527a1b6-ec32-4b21-bfaf-1be6eaadbdf3 | 3a56eb1f-9dd7-5ed8-a65d-b5f004e5d11b | |
3a56eb1f-9dd7-5ed8-a65d-b5f004e5d11b | 65 | |
1a80dde0-c441-4c10-ab40-e8914de3fbe8 | 3c08955a-51f3-57de-8b12-cbb531c91d3b | |
3c08955a-51f3-57de-8b12-cbb531c91d3b | 65 |
The requestIds were obtained from the logs matching the strings Worker successfully handled message
and the project snapshot name, hca_prod_7c75f07c608d4c4aa1b7b13d11c0ad31__20220117_dcp2_20230314_dcp25
.
The previous logs from indexer-prod-contribute
are from a targeted reindex of the dcp39
catalog, the provided reproduction originally used the dcp38
catalog, but the problem is still reproducible in dcp39
, so the reproduction was updated.
Assignee to consider next steps.
Started Slack thread: https://humancellatlas.slack.com/archives/C01360XN04S/p1720466181645819
@hannes-ucsc: "On Slack wranglers expressed openness to fixing the sub-graph structure, but in a way that seemed to not correctly reflect the provenance of the analysis file (skipping sequence files). I've raised that concern but have not heard back. Because of the admission that the snapshot is defective or could be improved we'll simply remove the snapshot. Assignee to file PR for that."
For short reproduction, see https://github.com/DataBiosphere/azul/issues/6335#issuecomment-2190150632. EDIT: @achave11-ucsc
https://gitlab.azul.data.humancellatlas.org/ucsc/azul/-/jobs/59809/raw
With sub-test run logs …
… and test error logs …
These are the relevant service logs, note that the requestId filter was determined by matching the endpoint and catalog filters of the failed requests. CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-prod
start-time: 2024-06-13T02:24:54.000Z
end-time: 2024-06-13T02:25:54.000Z
query-string: