Open achave11-ucsc opened 5 months ago
Re-running the command hours after the second attempt, actually succeeded,
❯ python scripts/reindex.py --deindex --catalogs anvil6 --sources 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'
2024-05-24 14:00:17,352 DEBUG MainThread __main__: Source glob 'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3' matched sources ['tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'] in catalog 'anvil6'
2024-05-24 14:00:17,358 INFO MainThread botocore.credentials: Found credentials in shared credentials file: ~/.aws/credentials
2024-05-24 14:00:17,394 INFO MainThread azul.deployment: Allocated new Boto3 client for 'secretsmanager' with ID 4379854800
2024-05-24 14:00:18,191 INFO MainThread azul.terra: Making GET request to 'https://data.terra.bio/api/repository/v1/snapshots?filter=ANVIL_T2T_CHRY_20240301_ANV5_202403040508&limit=2'
2024-05-24 14:00:18,192 DEBUG MainThread azul.terra: … without request body
2024-05-24 14:00:22,369 INFO MainThread azul.terra: Got 200 response after 4.177s from GET to https://data.terra.bio/api/repository/v1/snapshots?filter=ANVIL_T2T_CHRY_20240301_ANV5_202403040508&limit=2
2024-05-24 14:00:22,369 DEBUG MainThread azul.terra: … with response headers HTTPHeaderDict({'Date': 'Fri, 24 May 2024 21:00:22 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'pV5Mb5bB', 'Content-Type': 'application/json', 'Content-Length': '891', 'Vary': 'Accept-Encoding,Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2024-05-24 14:00:22,370 DEBUG MainThread azul.terra: … with response body b'{"total":1737,"filteredTotal":1,"items":[{"id":"f4accfc6-d9e4-49b1-a590-6a580b4d305f","name":"ANVIL_T2T_CHRY_20240301_ANV5_20...'
2024-05-24 14:00:22,371 INFO MainThread azul.terra: Making GET request to 'https://data.terra.bio/api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f'
2024-05-24 14:00:22,371 DEBUG MainThread azul.terra: … without request body
2024-05-24 14:00:42,374 WARNING MainThread urllib3.connectionpool: Retrying (_LimitedRetry(total=None, connect=2, read=2, redirect=0, status=2)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='data.terra.bio', port=443): Read timed out. (read timeout=20)")': /api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 14:01:02,554 WARNING MainThread urllib3.connectionpool: Retrying (_LimitedRetry(total=None, connect=2, read=1, redirect=0, status=2)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='data.terra.bio', port=443): Read timed out. (read timeout=20)")': /api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 14:01:20,613 INFO MainThread azul.terra: Got 200 response after 58.242s from GET to https://data.terra.bio/api/repository/v1/snapshots/f4accfc6-d9e4-49b1-a590-6a580b4d305f
2024-05-24 14:01:20,613 DEBUG MainThread azul.terra: … with response headers HTTPHeaderDict({'Date': 'Fri, 24 May 2024 21:01:19 GMT', 'Server': 'Apache', 'X-Frame-Options': 'SAMEORIGIN', 'Access-Control-Allow-Headers': 'DNT,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range,Authorization,Accept,Referer,X-App-Id,Origin', 'Access-Control-Allow-Methods': 'GET,POST,DELETE,PUT,PATCH,OPTIONS,HEAD', 'X-Content-Type-Options': 'nosniff', 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'Cache-Control': 'no-cache,no-store,must-revalidate', 'X-Request-ID': 'a8q1P7JK', 'Content-Type': 'application/json', 'Content-Length': '37926', 'Vary': 'Accept-Encoding,Origin', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
2024-05-24 14:01:20,614 DEBUG MainThread azul.terra: … with response body b'{"id":"f4accfc6-d9e4-49b1-a590-6a580b4d305f","name":"ANVIL_T2T_CHRY_20240301_ANV5_202403040508","description":"Full view snap...'
2024-05-24 14:01:20,632 INFO MainThread azul.deployment: Allocated new Boto3 client for 'es' with ID 4381189520
2024-05-24 14:01:21,197 DEBUG MainThread azul.es: Creating ES client [vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443]
2024-05-24 14:01:21,205 INFO MainThread azul.deployment: Allocated new Boto3 client for 'sts' with ID 4381725392
2024-05-24 14:01:21,215 INFO MainThread botocore.credentials: Found credentials in environment variables.
2024-05-24 14:01:21,215 INFO MainThread azul.azulclient: Deindexing sources {'tdr:datarepo-e5b16a5a:snapshot/ANVIL_T2T_CHRY_20240301_ANV5_202403040508:/3'} from catalog 'anvil6'
2024-05-24 14:01:21,215 DEBUG MainThread azul.azulclient: Using query: {'query': {'bool': {'should': [{'terms': {'sources.id.keyword': ['f4accfc6-d9e4-49b1-a590-6a580b4d305f']}}, {'terms': {'source.id.keyword': ['f4accfc6-d9e4-49b1-a590-6a580b4d305f']}}]}}}
2024-05-24 14:01:21,216 INFO MainThread elasticsearch: Making POST request to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 14:01:21,216 INFO MainThread elasticsearch: … with request body b'{"query":{"bool":{"should":[{"terms":{"sources.id.keyword":["f4accfc6-d9e4-49b1-a590-6a580b4d305f"]}},{"terms":{"source.id.ke...'
2024-05-24 14:01:21,782 INFO MainThread elasticsearch: Got 200 response after 0.566s from POST to https://vpc-azul-index-anvilprod-ggipah4skn2ftt47u4xgvydzqm.us-east-1.es.amazonaws.com:443/azul_v2_anvilprod_anvil6_activities,azul_v2_anvilprod_anvil6_activities_aggregate,azul_v2_anvilprod_anvil6_biosamples,azul_v2_anvilprod_anvil6_biosamples_aggregate,azul_v2_anvilprod_anvil6_bundles,azul_v2_anvilprod_anvil6_bundles_aggregate,azul_v2_anvilprod_anvil6_datasets,azul_v2_anvilprod_anvil6_datasets_aggregate,azul_v2_anvilprod_anvil6_diagnoses,azul_v2_anvilprod_anvil6_diagnoses_aggregate,azul_v2_anvilprod_anvil6_donors,azul_v2_anvilprod_anvil6_donors_aggregate,azul_v2_anvilprod_anvil6_files,azul_v2_anvilprod_anvil6_files_aggregate,azul_v2_anvilprod_anvil6_replica/_delete_by_query?slices=auto
2024-05-24 14:01:21,782 INFO MainThread elasticsearch: … with response body '{"took":42,"timed_out":false,"total":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},…'
@hannes-ucsc: "The solution is most likely to partition the deletion requests so that no request takes longer than 30 seconds, which is a safe margin away from the client timeout of one minute. There may be other solutions. Assignee to consider those. At the moment, the work-around is to retry until the request returns a 200."
Deleting snapshot
T2T_CHRY
(largest at the moment with 309,979 sub-graphs) inanvilprod
took longer than 60s to execute, causing an elasticsearch client timeout.Running …
… outputted:
Retrying this command shortly after the first run returned 409 responses for each of the indices in ElasticSearch: