DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
6 stars 2 forks source link

Manifest request fails if filter params are rearranged #6417

Closed dsotirho-ucsc closed 10 hours ago

dsotirho-ucsc commented 1 month ago

Steps:

Reproduction:

# Filters v1: {"organ":{"is":["pancreas"]},"fileFormat":{"is":["fastq.gz"]}}
# Filters v2: {"fileFormat":{"is":["fastq.gz"]},"organ":{"is":["pancreas"]}}

#################################################
# Request first manifest (filters v1)

$ curl -X PUT 'https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files?catalog=dcp3&filters=%7B%22organ%22%3A%7B%22is%22%3A%5B%22pancreas%22%5D%7D%2C%22fileFormat%22%3A%7B%22is%22%3A%5B%22fastq.gz%22%5D%7D%7D&format=compact'
{"Status":301,"Location":"https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files/k8QgpmQyhhgRDUCISaeILk2U9Nd0uFN4cLZUwJdXcGScEKAAAQ==","Retry-After":1}

$ curl 'https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files/k8QgpmQyhhgRDUCISaeILk2U9Nd0uFN4cLZUwJdXcGScEKAAAQ=='
{"Status":302,"Location":"https://s3.amazonaws.com/edu-ucsc-gi-platform-hca-dev-storage-dev.us-east-1/manifests/bb138210-16e4-5665-8543-ffb84f5e94ef.ad14b32b-dea0-555e-b745-8ebe5bb0958d.tsv?response-content-disposition=…

#################################################
# Delete manifest from s3 bucket

$ aws s3api list-objects-v2 --bucket edu-ucsc-gi-platform-hca-dev-storage-dev.us-east-1 --prefix 'manifests/bb138210'
{
    "Contents": [
        {
            "Key": "manifests/bb138210-16e4-5665-8543-ffb84f5e94ef.ad14b32b-dea0-555e-b745-8ebe5bb0958d.tsv",
            "LastModified": "2024-07-24T22:19:11.000Z",
            "ETag": "\"3fc2bb2dcf9938909bd77d0010e53acd-1\"",
            "Size": 17491206,
            "StorageClass": "STANDARD"
        }
    ],
    "RequestCharged": null
}

$ aws s3api delete-object --bucket 'edu-ucsc-gi-platform-hca-dev-storage-dev.us-east-1' --key 'manifests/bb138210-16e4-5665-8543-ffb84f5e94ef.ad14b32b-dea0-555e-b745-8ebe5bb0958d.tsv'

$ aws s3api list-objects-v2 --bucket edu-ucsc-gi-platform-hca-dev-storage-dev.us-east-1 --prefix 'manifests/bb138210'
{
    "RequestCharged": null
}

#################################################
# Request second manifest (filters v2)

$ curl -X PUT 'https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files?catalog=dcp3&filters=%7B%22fileFormat%22%3A%7B%22is%22%3A%5B%22fastq.gz%22%5D%7D%2C%22organ%22%3A%7B%22is%22%3A%5B%22pancreas%22%5D%7D%7D&format=compact'
Traceback (most recent call last):
  File "/var/task/azul/service/manifest_controller.py", line 139, in get_manifest_async
    manifest = self.service.get_cached_manifest(format=format,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/manifest_service.py", line 652, in get_cached_manifest
    return self._get_cached_manifest(generator_cls, manifest_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/manifest_service.py", line 693, in _get_cached_manifest
    raise CachedManifestNotFound(manifest_key)
azul.service.manifest_service.CachedManifestNotFound: ManifestKey(catalog='dcp3', format=<ManifestFormat.compact: 'compact'>, manifest_hash=UUID('bb138210-16e4-5665-8543-ffb84f5e94ef'), source_hash=UUID('ad14b32b-dea0-555e-b745-8ebe5bb0958d'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/azul/service/async_manifest_service.py", line 131, in start_generation
    execution = self._sfn.start_execution(stateMachineArn=self.machine_arn,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ExecutionAlreadyExists: An error occurred (ExecutionAlreadyExists) when calling the StartExecution operation: Execution Already Exists: 'arn:aws:states:us-east-1:122796619775:execution:azul-manifest-dev:pmQyhhgRDUCISaeILk2U9Nd0uFN4cLZUwJdXcGScEKA'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1917, in _get_view_function_response
    response = view_function(**function_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/app.py", line 1593, in fetch_file_manifest
    return _file_manifest(fetch=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/app.py", line 1619, in _file_manifest
    return app.manifest_controller.get_manifest_async(query_params=query_params,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/manifest_controller.py", line 156, in get_manifest_async
    token = self.async_service.start_generation(execution_key, input)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/async_manifest_service.py", line 145, in start_generation
    raise InvalidGeneration(token)
azul.service.async_manifest_service.InvalidGeneration: Token(execution_id=b'\xa6d2\x86\x18\x11\r@\x88I\xa7\x88.M\x94\xf4\xd7t\xb8Sxp\xb6T\xc0\x97Wpd\x9c\x10\xa0', request_index=0, retry_after=1)

Alternate Reproduction:

The same error can be produced if the second manifest request is quickly made before the first manifest request has completed generating the manifest.

# Filters v1: {"organ":{"is":["brain"]},"fileFormat":{"is":["fastq.gz"]}}
# Filters v2: {"fileFormat":{"is":["fastq.gz"]},"organ":{"is":["brain"]}}

#################################################
# Request first manifest (filters v1)

$ date; curl -X PUT 'https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files?catalog=dcp3&filters=%7B%22organ%22%3A%7B%22is%22%3A%5B%22brain%22%5D%7D%2C%22fileFormat%22%3A%7B%22is%22%3A%5B%22fastq.gz%22%5D%7D%7D&format=compact'
Wed Jul 24 15:01:35 PDT 2024
{"Status":301,"Location":"https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files/k8QgVX2n3DBC78s00Rh3UCDaM91SW3XSgChKIm4-oZSxrzQAAQ==","Retry-After":1}

#################################################
# (Quicly) request second manifest (filters v2)

$ date; curl -X PUT 'https://service.dev.singlecell.gi.ucsc.edu/fetch/manifest/files?catalog=dcp3&filters=%7B%22fileFormat%22%3A%7B%22is%22%3A%5B%22fastq.gz%22%5D%7D%2C%22organ%22%3A%7B%22is%22%3A%5B%22brain%22%5D%7D%7D&format=compact'
Wed Jul 24 15:01:37 PDT 2024
Traceback (most recent call last):
  File "/var/task/azul/service/manifest_controller.py", line 139, in get_manifest_async
    manifest = self.service.get_cached_manifest(format=format,
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/manifest_service.py", line 652, in get_cached_manifest
    return self._get_cached_manifest(generator_cls, manifest_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/manifest_service.py", line 693, in _get_cached_manifest
    raise CachedManifestNotFound(manifest_key)
azul.service.manifest_service.CachedManifestNotFound: ManifestKey(catalog='dcp3', format=<ManifestFormat.compact: 'compact'>, manifest_hash=UUID('6d6e8bf9-a1b0-53fd-821c-2b47aa370a97'), source_hash=UUID('ad14b32b-dea0-555e-b745-8ebe5bb0958d'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/azul/service/async_manifest_service.py", line 131, in start_generation
    execution = self._sfn.start_execution(stateMachineArn=self.machine_arn,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/botocore/client.py", line 535, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/python/botocore/client.py", line 980, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.ExecutionAlreadyExists: An error occurred (ExecutionAlreadyExists) when calling the StartExecution operation: Execution Already Exists: 'arn:aws:states:us-east-1:122796619775:execution:azul-manifest-dev:VX2n3DBC78s00Rh3UCDaM91SW3XSgChKIm4-oZSxrzQ'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1917, in _get_view_function_response
    response = view_function(**function_args)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/app.py", line 1593, in fetch_file_manifest
    return _file_manifest(fetch=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/app.py", line 1619, in _file_manifest
    return app.manifest_controller.get_manifest_async(query_params=query_params,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/manifest_controller.py", line 156, in get_manifest_async
    token = self.async_service.start_generation(execution_key, input)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/task/azul/service/async_manifest_service.py", line 145, in start_generation
    raise InvalidGeneration(token)
azul.service.async_manifest_service.InvalidGeneration: Token(execution_id=b'U}\xa7\xdc0B\xef\xcb4\xd1\x18wP \xda3\xddR[u\xd2\x80(J"n>\xa1\x94\xb1\xaf4', request_index=0, retry_after=1)
achave11-ucsc commented 1 month ago

Assignee to add reproduction that's not sensitive to timing by removing the cached-manifest in between the two requests.

achave11-ucsc commented 1 month ago

Assignee to consider next steps.

hannes-ucsc commented 1 month ago

Assignee to ensure that both repros are present. Currently the original time-sensitive repro is missing, which does not comply with the instructions from above, which read

Assignee to add reproduction that's not sensitive to timing by removing the cached-manifest in between the two requests.

Note the word "add".

dsotirho-ucsc commented 1 month ago

Added reproductions using dev deployment to ticket description.

hannes-ucsc commented 1 week ago

For demo, attempt to reproduce.