DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Manifest content hash computation times out during reindex #5834

Closed achave11-ucsc closed 8 months ago

achave11-ucsc commented 10 months ago

… causing noisy service execution error alarms.

A full reindex was required by Promotion 2023-12-20 · Issue #5814

Service execution logs:

[
    {
        "@timestamp": "2023-12-22 06:43:53.386",
        "@message": "START RequestId: 8e8947ec-dd5f-4fd9-8038-3d2a80432665 Version: $LATEST\n"
    },
    {
        "@timestamp": "2023-12-22 06:43:53.387",
        "@message": "[INFO]\t2023-12-22T06:43:53.387Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.chalice\tReceived GET request for '/fetch/manifest/files', with {\"query\": {\"catalog\": \"dcp33\", \"filters\": \"{}\"}, \"headers\": {\"accept\": \"application/json, text/xml, application/xml, */*\", \"accept-encoding\": \"deflate, gzip\", \"cloudfront-forwarded-proto\": \"https\", \"cloudfront-is-desktop-viewer\": \"true\", \"cloudfront-is-mobile-viewer\": \"false\", \"cloudfront-is-smarttv-viewer\": \"false\", \"cloudfront-is-tablet-viewer\": \"false\", \"cloudfront-viewer-asn\": \"4134\", \"cloudfront-viewer-country\": \"CN\", \"host\": \"[service.azul.data.humancellatlas.org](http://service.azul.data.humancellatlas.org/)\", \"user-agent\": \"libcurl/8.4.0 r-curl/5.2.0 httr/1.4.7\", \"via\": \"2.0 [edffe6978db53d114a80cda421e0b6b8.cloudfront.net](http://edffe6978db53d114a80cda421e0b6b8.cloudfront.net/) (CloudFront)\", \"x-amz-cf-id\": \"Shit6y0CepIX5fmEKl1l3RgHPvTfmqpKutsWd-WCC1WknBhvsjmfNQ==\", \"x-amzn-trace-id\": \"Root=1-65853029-7a050aec3ea1f95a5bf9cff0\", \"x-forwarded-for\": \"222.208.207.140, 130.176.223.77\", \"x-forwarded-port\": \"443\", \"x-forwarded-proto\": \"https\"}}."
    },
    {
        "@timestamp": "2023-12-22 06:43:53.387",
        "@message": "[INFO]\t2023-12-22T06:43:53.387Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.chalice\tDid not authenticate request."
    },
    {
        "@timestamp": "2023-12-22 06:43:53.395",
        "@message": "[INFO]\t2023-12-22T06:43:53.395Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.boto3\tdynamodb.GetItem:\tMaking POST request to https://dynamodb.us-east-1.amazonaws.com/"
    },
    {
        "@timestamp": "2023-12-22 06:43:53.395",
        "@message": "[INFO]\t2023-12-22T06:43:53.395Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.boto3\tdynamodb.GetItem:\t… with request body b'{\"TableName\": \"azul-sources_cache_by_auth-prod\", \"Key\": {\"identity\": {\"S\": \"dcp33:\"}}, \"ProjectionExpression\": \"sources,expir...'"
    },
    {
        "@timestamp": "2023-12-22 06:43:53.400",
        "@message": "[INFO]\t2023-12-22T06:43:53.400Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.boto3\tdynamodb.GetItem:\tGot 200 response"
    },
    {
        "@timestamp": "2023-12-22 06:43:53.401",
        "@message": "[INFO]\t2023-12-22T06:43:53.400Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.boto3\tdynamodb.GetItem:\t… with response body b'{\"Item\":{\"expiration\":{\"N\":\"1703227462\"},\"sources\":{\"S\":\"[\\\\\"4edbacc8-3695-4363-87ee-9f4e8c571f10\\\\\", \\\\\"f4a377db-4fb0-4f56-b3ab...'"
    },
    {
        "@timestamp": "2023-12-22 06:43:53.401",
        "@message": "[DEBUG]\t2023-12-22T06:43:53.401Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\tazul.service.manifest_service\tComputing content hash for manifest using filters Filters(explicit={}, source_ids={'8db98219-8a1b-4dfb-99f4-f8e8a69cdb50', … 'f49bbbcf-f081-496f-8a21-51a245bc91e0', 'f2c40da8-66a6-4671-9161-4aac96dd3b62', 'a1f6c669-a26d-4d3a-a90b-119019ef1f43', '1d28b559-e71f-49c7-a94d-4caa5d7892c1'}) ..."
    },
    {
        "@timestamp": "2023-12-22 06:43:53.404",
        "@message": "[INFO]\t2023-12-22T06:43:53.404Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\telasticsearch\tMaking POST request to [https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com:443/azul_v2_prod_dcp33_files_aggregate/_search](https://vpc-azul-index-prod-mjluewlvxikvhillefxotw2tbm.us-east-1.es.amazonaws.com/azul_v2_prod_dcp33_files_aggregate/_search)"
    },
    {
        "@timestamp": "2023-12-22 06:43:53.404",
        "@message": "[INFO]\t2023-12-22T06:43:53.404Z\t8e8947ec-dd5f-4fd9-8038-3d2a80432665\telasticsearch\t… with request body b'{\"query\":{\"bool\":{\"must\":[{\"constant_score\":{\"filter\":{\"terms\":{\"sources.id.keyword\":[\"8db98219-8a1b-4dfb-99f4-f8e8a69cdb50\",...'"
    },
    {
        "@timestamp": "2023-12-22 06:44:24.426",
        "@message": "2023-12-22T06:44:24.426Z 8e8947ec-dd5f-4fd9-8038-3d2a80432665 Task timed out after 31.04 seconds\n\n"
    },
    {
        "@timestamp": "2023-12-22 06:44:24.426",
        "@message": "END RequestId: 8e8947ec-dd5f-4fd9-8038-3d2a80432665\n"
    },
    {
        "@timestamp": "2023-12-22 06:44:24.426",
        "@message": "REPORT RequestId: 8e8947ec-dd5f-4fd9-8038-3d2a80432665\tDuration: 31039.72 ms\tBilled Duration: 31000 ms\tMemory Size: 2048 MB\tMax Memory Used: 208 MB\t\n"
    }
]

For the same period in which this execution timed out, the following logs can be retrieved from the indexer contributions lambda:

[
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp33_files",
        "lambda": "azul-indexer-prod-contribute",
        "count": "100"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp34_files",
        "lambda": "azul-indexer-prod-contribute",
        "count": "75"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp1_files",
        "lambda": "azul-indexer-prod-contribute",
        "count": "64"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp33_bundles",
        "lambda": "azul-indexer-prod-contribute",
        "count": "45"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp33_projects",
        "lambda": "azul-indexer-prod-contribute",
        "count": "45"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp33_samples",
        "lambda": "azul-indexer-prod-contribute",
        "count": "45"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp33_cell_suspensions",
        "lambda": "azul-indexer-prod-contribute",
        "count": "44"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp34_bundles",
        "lambda": "azul-indexer-prod-contribute",
        "count": "40"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp34_cell_suspensions",
        "lambda": "azul-indexer-prod-contribute",
        "count": "40"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp34_projects",
        "lambda": "azul-indexer-prod-contribute",
        "count": "40"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp34_samples",
        "lambda": "azul-indexer-prod-contribute",
        "count": "40"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp1_cell_suspensions",
        "lambda": "azul-indexer-prod-contribute",
        "count": "23"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp1_samples",
        "lambda": "azul-indexer-prod-contribute",
        "count": "23"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp1_projects",
        "lambda": "azul-indexer-prod-contribute",
        "count": "23"
    },
    {
        "request_type": "\tMaking PUT request ",
        "index": "azul_v2_prod_dcp1_bundles",
        "lambda": "azul-indexer-prod-contribute",
        "count": "23"
    },
    {
        "request_type": "",
        "index": "",
        "lambda": "azul-service-prod",
        "timeout_after": "",
        "count": "22"
    },
    {
        "request_type": "\tGot 201 response after 0.018s from PUT ",
        "index": "azul_v2_prod_dcp34_files",
        "lambda": "azul-indexer-prod-contribute",
        "count": "4"
    },
    {
        "request_type": "",
        "index": "",
        "lambda": "azul-service-prod",
        "timeout_after": " Task timed out after 31.04 s",
        "count": "1"
    },
    {
        "request_type": "\tGot 201 response after 0.019s from PUT ",
        "index": "azul_v2_prod_dcp34_projects",
        "lambda": "azul-indexer-prod-contribute",
        "count": "1"
    },
    {
        "request_type": "\tGot 201 response after 0.039s from PUT ",
        "index": "azul_v2_prod_dcp34_files",
        "lambda": "azul-indexer-prod-contribute",
        "count": "1"
    },
    {
        "request_type": "\tGot 201 response after 0.030s from PUT ",
        "index": "azul_v2_prod_dcp34_bundles",
        "lambda": "azul-indexer-prod-contribute",
        "count": "1"
    },
    {
        "request_type": "\tGot 201 response after 0.031s from PUT ",
        "index": "azul_v2_prod_dcp34_samples",
        "lambda": "azul-indexer-prod-contribute",
        "count": "1"
    },
    {
        "request_type": "\tGot 201 response after 0.038s from PUT ",
        "index": "azul_v2_prod_dcp34_cell_suspensions",
        "lambda": "azul-indexer-prod-contribute",
        "count": "1"
    }
]
achave11-ucsc commented 10 months ago

Assignee to consider next steps.

achave11-ucsc commented 9 months ago

A different flavor of this issue might have occurred. Read on.

hannes-ucsc commented 8 months ago

Duplicates #5528 which had a less informative title but has a resolution and is older.