DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

PFB manifest fails with multiple inner entities of the same type #3157

Closed jessebrennan closed 3 years ago

jessebrennan commented 3 years ago

The error from CloudWatch logs:

[ERROR] AssertionError: ({'entity_id': '5b9e94f1-8e19-5c52-a633-4eba6100ebaa', 'contents': {'sample_specimens': [{'has_input_biomaterial': [''], '_source': ['specimen_from_organism'], 'document_id': ['46cbd6a3-1ba4-4f57-b27d-4e2b918b0d4c', '85a0036b-fb11-40b5-b805-0af94bceee23', 'dfe889d8-86c0-4f50-adde-55bb58bec1ea', 'ef3a770b-6bd5-44d3-8638-f2747d7412f1'], 'biomaterial_id': ['PP001', 'PP011', 'PP013', 'PP017'], 'disease': [''], 'organ': ['blood', 'hematopoietic system', 'lung', 'mediastinal lymph node'], 'organ_part': ['Left lateral basal bronchopulmonary segment', 'bone marrow', 'mediastinal lymph node', 'venous blood'], 'storage_method': ['fresh'], 'preservation_method': ['fresh'], '_type': ['specimen']}], 'samples': [{'document_id': ['46cbd6a3-1ba4-4f57-b27d-4e2b918b0d4c', '85a0036b-fb11-40b5-b805-0af94bceee23', 'dfe889d8-86c0-4f50-adde-55bb58bec1ea', 'ef3a770b-6bd5-44d3-8638-f2747d7412f1'], 'biomaterial_id': ['PP001', 'PP011', 'PP013', 'PP017'], 'entity_type': ['specimens'], 'organ': ['blood', 'hematopoietic system', 'lung', 'mediastinal lymph node'], 'organ_part': ['Left lateral basal bronchopulmonary segment', 'bone marrow', 'mediastinal lymph node', 'venous blood'], 'model_organ': [''], 'model_organ_part': [''], 'effective_organ': ['blood', 'hematopoietic system', 'lung', 'mediastinal lymph node']}], 'sequencing_inputs': [{'document_id': ['10510764-ed13-4405-bd85-578abdc9ada2', '78b08aea-c194-4aa1-9e2e-56df9649bef1', 'b72a54dc-69d7-41c7-a951-2695a7dcf76e', 'fa6a84ba-f9de-4302-8b72-e940658e888d'], 'biomaterial_id': ['PP001_suspension', 'PP011_suspension', 'PP013_suspension', 'PP017_suspension'], 'sequencing_input_type': ['cell_suspension']}], 'specimens': [{'has_input_biomaterial': [''], '_source': ['specimen_from_organism'], 'document_id': ['46cbd6a3-1ba4-4f57-b27d-4e2b918b0d4c', '85a0036b-fb11-40b5-b805-0af94bceee23', 'dfe889d8-86c0-4f50-adde-55bb58bec1ea', 'ef3a770b-6bd5-44d3-8638-f2747d7412f1'], 'biomaterial_id': ['PP001', 'PP011', 'PP013', 'PP017'], 'disease': [''], 'organ': ['blood', 'hematopoietic system', 'lung', 'mediastinal lymph node'], 'organ_part': ['Left lateral basal bronchopulmonary segment', 'bone marrow', 'mediastinal lymph node', 'venous blood'], 'storage_method': ['fresh'], 'preservation_method': ['fresh'], '_type': ['specimen']}], 'cell_suspensions': [{'document_id': ['b72a54dc-69d7-41c7-a951-2695a7dcf76e'], 'biomaterial_id': ['PP011_suspension'], 'total_estimated_cells': None, 'selected_cell_type': ['T cell'], 'organ': ['hematopoietic system'], 'organ_part': ['bone marrow']}, {'document_id': ['10510764-ed13-4405-bd85-578abdc9ada2'], 'biomaterial_id': ['PP001_suspension'], 'total_estimated_cells': None, 'selected_cell_type': ['T cell'], 'organ': ['lung'], 'organ_part': ['Left lateral basal bronchopulmonary segment']}, {'document_id': ['fa6a84ba-f9de-4302-8b72-e940658e888d'], 'biomaterial_id': ['PP017_suspension'], 'total_estimated_cells': None, 'selected_cell_type': ['T cell'], 'organ': ['blood'], 'organ_part': ['venous blood']}, {'document_id': ['78b08aea-c194-4aa1-9e2e-56df9649bef1'], 'biomaterial_id': ['PP013_suspension'], 'total_estimated_cells': None, 'selected_cell_type': ['T cell'], 'organ': ['mediastinal lymph node'], 'organ_part': ['mediastinal lymph node']}], 'cell_lines': [], 'donors': [{'document_id': ['8b5774b2-2aba-4b90-8fa0-c7f69207802c', '95822e01-44a7-420b-92fc-b001460b1d13', 'b09a1e10-3c38-4bb3-8553-d762810a6fb7'], 'biomaterial_id': ['Blood_donor_A', 'Tissue_donor_1', 'Tissue_donor_2'], 'biological_sex': ['male'], 'genus_species': ['Homo sapiens'], 'development_stage': ['human adult stage'], 'diseases': ['hypertension', 'normal'], 'organism_age': [{'value': '50-55', 'unit': 'year'}, {'value': '52', 'unit': 'year'}, {'value': '65', 'unit': 'year'}], 'organism_age_value': ['50-55', '52', '65'], 'organism_age_unit': ['year'], 'organism_age_range': [{'gte': 1639872000.0, 'lte': 1639872000.0}, {'gte': 1576800000.0, 'lte': 1734480000.0}, {'gte': 2049840000.0, 'lte': 2049840000.0}], 'donor_count': 3}], 'organoids': [], 'files': [{'content-type': 'application/unknown; dcp-type=data', 'indexed': False, 'name': 'd6536459-ab4e-4954-a0ce-5e6d07670039.bam', 'crc32c': '55b8114c', 'sha256': '2a1e21a105c728844d2d50e248acc36958056a21912ee5989dbb814b11afc15a', 'size': 30482724642, 'uuid': '3b97c8ec-e735-5f45-9c75-942d2bf9f401', 'drs_path': 'v1_1a7ef136-7f53-4cee-970a-86a443052466_01d7c0aa-44ae-4ecf-8bdd-a82e13cf5076', 'version': '2020-11-21T00:15:39.000000Z', 'document_id': '5b9e94f1-8e19-5c52-a633-4eba6100ebaa', 'file_type': 'analysis_file', 'file_format': 'bam', 'content_description': [None], 'is_intermediate': None, 'source': None, '_type': 'file', 'related_files': [], 'matrix_cell_count': None}], 'analysis_protocols': [{'workflow': ['optimus_voptimus_v4.0.0', 'optimus_voptimus_v4.1.5']}], 'imaging_protocols': [], 'library_preparation_protocols': [{'library_construction_approach': ['10X v2 sequencing'], 'nucleic_acid_source': ['single cell']}], 'sequencing_protocols': [{'instrument_manufacturer_model': ['Illumina HiSeq 4000'], 'paired_end': [False]}], 'sequencing_processes': [{'document_id': ['219e1b92-9749-490c-b08a-f375ad4c9884', '36ca61c4-9752-4cdf-8bac-1e8da06d9ed1', 'd6536459-ab4e-4954-a0ce-5e6d07670039', 'fb72f4c2-7f35-40a6-af19-9b41d0680677']}], 'projects': [{'project_title': ['A single-cell reference map of transcriptional states for human blood and tissue T cell activation'], 'project_short_name': ['HumanTissueTcellActivation'], 'laboratory': ['Farber Lab; Columbia Center for Translational Immunology', 'Human Cell Atlas Data Coordination Platform', 'Sims Lab; Department of Systems Biology'], 'institutions': ['Columbia University Irving Medical Center', 'University of California, Santa Cruz'], 'document_id': ['4a95101c-9ffc-4f30-a809-f04518a23803'], 'publication_titles': [None], 'insdc_project_accessions': [None], 'geo_series_accessions': [None], 'array_express_accessions': [None], 'insdc_study_accessions': [None], 'supplementary_links': [None], '_type': ['project']}]}, 'num_contributions': 4, 'sources': [{'id': '1a7ef136-7f53-4cee-970a-86a443052466', 'name': 'tdr:broad-jade-dev-data:snapshot/hca_dev_20201203___20210524_lattice:'}], 'bundles': [{'uuid': 'd5f6a127-097f-5406-89d5-8e17204309df', 'version': '2020-10-28T11:26:39.000000Z'}, {'uuid': '74145f2a-7a2c-5e8f-8820-e75aa8171bfe', 'version': '2020-12-10T13:20:00.000000Z'}, {'uuid': '48711595-051f-52fb-9f60-6efa43079134', 'version': '2020-12-10T13:20:00.000000Z'}, {'uuid': '5f0feb9d-b089-5b6b-a629-8b4e666e5d18', 'version': '2020-12-10T13:20:00.000000Z'}], 'total_estimated_cells': 0}, 'cell_suspensions')
Traceback (most recent call last):
  File "/var/task/app.py", line 1558, in generate_manifest
    return app.manifest_controller.get_manifest(event)
  File "/var/task/azul/service/manifest_controller.py", line 61, in get_manifest
    manifest = self.service.get_manifest(format_=ManifestFormat(input['format_']),
  File "/var/task/azul/service/manifest_service.py", line 236, in get_manifest
    file_name = self._generate_manifest(generator, object_key)
  File "/var/task/azul/service/manifest_service.py", line 321, in _generate_manifest
    file_path, base_name = generator.create_file()
  File "/var/task/azul/service/manifest_service.py", line 1154, in create_file
    converter.add_doc(doc)
  File "/var/task/azul/service/avro_pfb.py", line 103, in add_doc
    assert False, (doc, entity_type)

~To reproduce the issue, index any combination of the following bundles from the catalog hca_prod_20201120_dcp2___20210707_dcp7:~

~d5f6a127-097f-5406-89d5-8e17204309df 2020-10-28T11:26:39.000000Z 74145f2a-7a2c-5e8f-8820-e75aa8171bfe 2020-10-28T11:26:39.000000Z 48711595-051f-52fb-9f60-6efa43079134 2020-12-10T13:20:00.000000Z 5f0feb9d-b089-5b6b-a629-8b4e666e5d18 2020-12-10T13:20:00.000000Z~

~and then generate a PFB manifest with no filters.~

~I can't find these bundles any longer. I've checked dcp5-7 snapshots with no luck. They all belonged to the project HumanTissueTcellActivation however, so indexing this project should trigger the issue. Unfortunately, I tested this project on prod and the manifest generated without issue. I can get some manifest generation errors on prod however, with broader filters. Without permissions for prod I'm unable to view CloudWatch logs.~

To reproduce this issue request: https://service.azul.data.humancellatlas.org/fetch/manifest/files?catalog=dcp7&filters=%7B%0A%20%20%22project%22%3A%20%7B%0A%20%20%20%20%22is%22%3A%20%5B%0A%20%20%20%20%20%20%20%22SingleCellsMultipleSclerosis%22%0A%20%20%20%20%5D%0A%20%20%7D%0A%7D&format=terra.pfb

which uses the filters:

{
  "project": {
    "is": [
       "SingleCellsMultipleSclerosis"
    ]
  }
}

Following the 301 leads to

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1135, in _get_view_function_response
    response = view_function(**function_args)
  File "/var/task/app.py", line 1538, in fetch_file_manifest
    return _file_manifest(fetch=True)
  File "/var/task/app.py", line 1552, in _file_manifest
    return app.manifest_controller.get_manifest_async(self_url=app.self_url(),
  File "/var/task/azul/service/manifest_controller.py", line 127, in get_manifest_async
    token_or_state = self.async_service.inspect_generation(token)
  File "/var/task/azul/service/async_manifest_service.py", line 99, in inspect_generation
    raise StateMachineError(status, output)
azul.service.step_function_helper.StateMachineError: ('Failed to generate manifest', 'FAILED', None)

And here's the new error from the CloudWatch logs:

[ERROR] AssertionError: ({'entity_id': '9ff93ab7-a63a-4f07-88ea-8746dd0759e3', 'contents': {'sample_specimens': [{'has_input_biomaterial': [None], '_source': ['specimen_from_organism'], 'document_id': ['1277f2eb-8d86-428e-8c41-6f2184849b5d', '1ea22658-d557-4d82-b9b3-77d08c7dcc0b', '25645f82-7b7d-4ca1-9615-05e3fe43b15a', '3dd26b9e-b870-4e5c-888a-c5f00dd881ad', '3ebd299a-15b7-424b-a1f8-a2d10d9a649a', '7b56dd2f-e4f3-4fad-a8e8-d927d71bd6c0', '7f9f255a-885c-4300-9783-3448155ec440', '81ed9d77-b951-40cc-a3c4-a1cd26045204', '99d37a7e-1c08-45a4-8aea-d2c561a4eca7', '9a89cece-f4a7-44b0-8c14-0f75bf68e0a6', '9baf89f7-1425-48da-b73b-5556857f43af', 'aa07277e-134e-479b-bcb2-2c3c36bfbb56', 'acbea770-33ec-48f6-99e2-38c5811c553b', 'b40b9e51-60cf-4996-9aee-23696135e81c', 'b4efe856-c177-479d-b7c4-63713d97e51e', 'c46d679f-9da5-45fd-978e-fb0994681b53', 'd3365e7b-508b-4a35-80e3-b7f3a42ced67', 'db1f0c8d-aaf6-48f3-bb19-d485ba54d054', 'e29a9078-49e2-4d73-81b5-be6789f57127', 'eab50cb7-5ca5-42a5-9005-e8269cae26e4', 'ec3b0dee-a5e6-4544-b9d3-3845f5fcc953', 'f9ffd472-37fd-4658-84a2-eed286056e9f'], 'biomaterial_id': ['SAMN12880970', 'SAMN12880971', 'SAMN12880972', 'SAMN12880973', 'SAMN12880974', 'SAMN12880975', 'SAMN12880976', 'SAMN12880977', 'SAMN12880978', 'SAMN12880979', 'SAMN12880980', 'SAMN12880981', 'SAMN12880982', 'SAMN12880983', 'SAMN12880984', 'SAMN12880985', 'SAMN12880986', 'SAMN12880987', 'SAMN12880988', 'SAMN12880989', 'SAMN12880990', 'SAMN12880991'], 'disease': ['intracranial hypertension', 'multiple sclerosis'], 'organ': ['blood', 'central nervous system'], 'organ_part': ['cerebrospinal fluid', None], 'storage_method': [None], 'preservation_method': [None], '_type': ['specimen']}], 'samples': [{'document_id': ['1277f2eb-8d86-428e-8c41-6f2184849b5d', '1ea22658-d557-4d82-b9b3-77d08c7dcc0b', '25645f82-7b7d-4ca1-9615-05e3fe43b15a', '3dd26b9e-b870-4e5c-888a-c5f00dd881ad', '3ebd299a-15b7-424b-a1f8-a2d10d9a649a', '7b56dd2f-e4f3-4fad-a8e8-d927d71bd6c0', '7f9f255a-885c-4300-9783-3448155ec440', '81ed9d77-b951-40cc-a3c4-a1cd26045204', '99d37a7e-1c08-45a4-8aea-d2c561a4eca7', '9a89cece-f4a7-44b0-8c14-0f75bf68e0a6', '9baf89f7-1425-48da-b73b-5556857f43af', 'aa07277e-134e-479b-bcb2-2c3c36bfbb56', 'acbea770-33ec-48f6-99e2-38c5811c553b', 'b40b9e51-60cf-4996-9aee-23696135e81c', 'b4efe856-c177-479d-b7c4-63713d97e51e', 'c46d679f-9da5-45fd-978e-fb0994681b53', 'd3365e7b-508b-4a35-80e3-b7f3a42ced67', 'db1f0c8d-aaf6-48f3-bb19-d485ba54d054', 'e29a9078-49e2-4d73-81b5-be6789f57127', 'eab50cb7-5ca5-42a5-9005-e8269cae26e4', 'ec3b0dee-a5e6-4544-b9d3-3845f5fcc953', 'f9ffd472-37fd-4658-84a2-eed286056e9f'], 'biomaterial_id': ['SAMN12880970', 'SAMN12880971', 'SAMN12880972', 'SAMN12880973', 'SAMN12880974', 'SAMN12880975', 'SAMN12880976', 'SAMN12880977', 'SAMN12880978', 'SAMN12880979', 'SAMN12880980', 'SAMN12880981', 'SAMN12880982', 'SAMN12880983', 'SAMN12880984', 'SAMN12880985', 'SAMN12880986', 'SAMN12880987', 'SAMN12880988', 'SAMN12880989', 'SAMN12880990', 'SAMN12880991'], 'entity_type': ['specimens'], 'organ': ['blood', 'central nervous system'], 'organ_part': ['cerebrospinal fluid', None], 'model_organ': [None], 'model_organ_part': [None], 'effective_organ': ['blood', 'central nervous system']}], 'sequencing_inputs': [{'document_id': ['00b2b7d2-fd5f-481e-9a81-2e24371fa5cc', '0ad718cf-2358-48e5-baf6-121bf1d2a165', '29c67e72-0e0a-4329-9d56-ba9fc04393a2', '2d2ff427-4965-4117-a535-2dc03f82e844', '2e9955f9-b0dc-4b73-ba1b-c1628a429a98', '3deb74cb-bb85-406c-84fe-33f490879a73', '63a27440-0289-45f5-81c1-c8d27d0c0613', '6a4f2bc0-64a4-4d67-a81c-898b59cb765a', '7b7f2f4b-b3ab-4a4d-847f-2d0075c2fd13', '8802df76-5878-495a-b226-998837f26d94', '8be5a04c-b150-40a7-a3f3-8a3bfe53aeaf', 'a62ed80b-204c-4559-8e7c-2ef763964961', 'aa4fe3fe-0a18-4747-bae0-15fa921ddcca', 'ab2a21b0-f770-48c0-a426-f84c7054575c', 'ae797078-e72a-44d2-92f5-b29d112eed98', 'b6ab85aa-499e-4bb2-9b4a-9aa5fd39299c', 'beaf7e26-2b61-47be-8608-f716ebf159c0', 'bf521170-d784-4bb1-ae39-d6703d1ad510', 'c9ef654e-ef4b-4cef-99bb-1de9421fe6b4', 'd8f08b9e-9d0e-490c-ae3c-483056f59e6d', 'de718f86-dca9-4f0e-8341-5155ff6a9d77', 'f22c91aa-0424-4187-a331-7f50c99f1bb8'], 'biomaterial_id': ['SRX6931239', 'SRX6931240', 'SRX6931241', 'SRX6931242', 'SRX6931243', 'SRX6931244', 'SRX6931245', 'SRX6931246', 'SRX6931247', 'SRX6931248', 'SRX6931249', 'SRX6931250', 'SRX6931251', 'SRX6931252', 'SRX6931253', 'SRX6931254', 'SRX6931255', 'SRX6931256', 'SRX6931257', 'SRX6931258', 'SRX6931259', 'SRX6931260'], 'sequencing_input_type': ['cell_suspension']}], 'specimens': [{'has_input_biomaterial': [None], '_source': ['specimen_from_organism'], 'document_id': ['1277f2eb-8d86-428e-8c41-6f2184849b5d', '1ea22658-d557-4d82-b9b3-77d08c7dcc0b', '25645f82-7b7d-4ca1-9615-05e3fe43b15a', '3dd26b9e-b870-4e5c-888a-c5f00dd881ad', '3ebd299a-15b7-424b-a1f8-a2d10d9a649a', '7b56dd2f-e4f3-4fad-a8e8-d927d71bd6c0', '7f9f255a-885c-4300-9783-3448155ec440', '81ed9d77-b951-40cc-a3c4-a1cd26045204', '99d37a7e-1c08-45a4-8aea-d2c561a4eca7', '9a89cece-f4a7-44b0-8c14-0f75bf68e0a6', '9baf89f7-1425-48da-b73b-5556857f43af', 'aa07277e-134e-479b-bcb2-2c3c36bfbb56', 'acbea770-33ec-48f6-99e2-38c5811c553b', 'b40b9e51-60cf-4996-9aee-23696135e81c', 'b4efe856-c177-479d-b7c4-63713d97e51e', 'c46d679f-9da5-45fd-978e-fb0994681b53', 'd3365e7b-508b-4a35-80e3-b7f3a42ced67', 'db1f0c8d-aaf6-48f3-bb19-d485ba54d054', 'e29a9078-49e2-4d73-81b5-be6789f57127', 'eab50cb7-5ca5-42a5-9005-e8269cae26e4', 'ec3b0dee-a5e6-4544-b9d3-3845f5fcc953', 'f9ffd472-37fd-4658-84a2-eed286056e9f'], 'biomaterial_id': ['SAMN12880970', 'SAMN12880971', 'SAMN12880972', 'SAMN12880973', 'SAMN12880974', 'SAMN12880975', 'SAMN12880976', 'SAMN12880977', 'SAMN12880978', 'SAMN12880979', 'SAMN12880980', 'SAMN12880981', 'SAMN12880982', 'SAMN12880983', 'SAMN12880984', 'SAMN12880985', 'SAMN12880986', 'SAMN12880987', 'SAMN12880988', 'SAMN12880989', 'SAMN12880990', 'SAMN12880991'], 'disease': ['intracranial hypertension', 'multiple sclerosis'], 'organ': ['blood', 'central nervous system'], 'organ_part': ['cerebrospinal fluid', None], 'storage_method': [None], 'preservation_method': [None], '_type': ['specimen']}], 'cell_suspensions': [{'document_id': ['00b2b7d2-fd5f-481e-9a81-2e24371fa5cc', '2e9955f9-b0dc-4b73-ba1b-c1628a429a98', '3deb74cb-bb85-406c-84fe-33f490879a73', '63a27440-0289-45f5-81c1-c8d27d0c0613', '6a4f2bc0-64a4-4d67-a81c-898b59cb765a', '8802df76-5878-495a-b226-998837f26d94', 'a62ed80b-204c-4559-8e7c-2ef763964961', 'ab2a21b0-f770-48c0-a426-f84c7054575c', 'beaf7e26-2b61-47be-8608-f716ebf159c0', 'bf521170-d784-4bb1-ae39-d6703d1ad510', 'c9ef654e-ef4b-4cef-99bb-1de9421fe6b4', 'de718f86-dca9-4f0e-8341-5155ff6a9d77'], 'biomaterial_id': ['SRX6931239', 'SRX6931240', 'SRX6931241', 'SRX6931242', 'SRX6931243', 'SRX6931244', 'SRX6931245', 'SRX6931246', 'SRX6931247', 'SRX6931248', 'SRX6931249', 'SRX6931250'], 'total_estimated_cells': 34400, 'selected_cell_type': [None], 'organ': ['central nervous system'], 'organ_part': ['cerebrospinal fluid']}, {'document_id': ['0ad718cf-2358-48e5-baf6-121bf1d2a165', '29c67e72-0e0a-4329-9d56-ba9fc04393a2', '2d2ff427-4965-4117-a535-2dc03f82e844', '7b7f2f4b-b3ab-4a4d-847f-2d0075c2fd13', '8be5a04c-b150-40a7-a3f3-8a3bfe53aeaf', 'aa4fe3fe-0a18-4747-bae0-15fa921ddcca', 'ae797078-e72a-44d2-92f5-b29d112eed98', 'b6ab85aa-499e-4bb2-9b4a-9aa5fd39299c', 'd8f08b9e-9d0e-490c-ae3c-483056f59e6d', 'f22c91aa-0424-4187-a331-7f50c99f1bb8'], 'biomaterial_id': ['SRX6931251', 'SRX6931252', 'SRX6931253', 'SRX6931254', 'SRX6931255', 'SRX6931256', 'SRX6931257', 'SRX6931258', 'SRX6931259', 'SRX6931260'], 'total_estimated_cells': 43000, 'selected_cell_type': [None], 'organ': ['blood'], 'organ_part': [None]}], 'cell_lines': [], 'donors': [{'document_id': ['0f1ccf79-10e0-424f-b7c4-a1af2dba7453', '5e6a01d0-fbf6-4eb8-94bb-d66486d8c42d', '62b4beaf-30c9-40c8-acc5-ab9a37581bf5', '6834ef8a-deb3-4c58-83da-31a9029bffd7', '7890c01d-d59e-4fbc-8032-380aba8d64a0', '8d9898d2-1150-475a-a708-2314e974b6ee', '98c9f404-3535-42a3-a88a-c0314746b5f9', 'bd3588ae-373a-477f-ae08-775b992f750d', 'c83faf34-372a-4d3e-b90a-3da6fb73a54c', 'de1c5329-e273-46b8-8837-9687be869495', 'ea9de748-1226-40bf-8ca0-bf45ee68184b', 'f28e1707-303f-43f4-a6e7-9c909a1ec8eb'], 'biomaterial_id': ['MS19270', 'MS49131', 'MS58637', 'MS60249', 'MS71658', 'MS74594', 'PST83775', 'PST95809', 'PTC32190', 'PTC41540', 'PTC45044', 'PTC85037'], 'biological_sex': ['female', 'male'], 'genus_species': ['Homo sapiens'], 'development_stage': ['human adult stage'], 'diseases': ['intracranial hypertension', 'multiple sclerosis'], 'organism_age': [{'value': '22.0', 'unit': 'year'}, {'value': '25.0', 'unit': 'year'}, {'value': '28.0', 'unit': 'year'}, {'value': '32.0', 'unit': 'year'}, {'value': '33.0', 'unit': 'year'}, {'value': '35.0', 'unit': 'year'}, {'value': '42.0', 'unit': 'year'}, {'value': '43.0', 'unit': 'year'}, {'value': '47.0', 'unit': 'year'}], 'organism_age_value': ['22.0', '25.0', '28.0', '32.0', '33.0', '35.0', '42.0', '43.0', '47.0'], 'organism_age_unit': ['year'], 'organism_age_range': [{'gte': 693792000.0, 'lte': 693792000.0}, {'gte': 788400000.0, 'lte': 788400000.0}, {'gte': 883008000.0, 'lte': 883008000.0}, {'gte': 1009152000.0, 'lte': 1009152000.0}, {'gte': 1040688000.0, 'lte': 1040688000.0}, {'gte': 1103760000.0, 'lte': 1103760000.0}, {'gte': 1324512000.0, 'lte': 1324512000.0}, {'gte': 1356048000.0, 'lte': 1356048000.0}, {'gte': 1482192000.0, 'lte': 1482192000.0}], 'donor_count': 12}], 'organoids': [], 'files': [{'content-type': 'application/x-tar; dcp-type=data; dcp-type=data', 'indexed': False, 'name': 'GSE138266_RAW.tar', 'crc32c': '34c6b06d', 'sha256': '70266f0a33e2ab760988562887887a48d32cfe3881663bb78686f5c12ed5a92b', 'size': 264058880, 'uuid': '011b388e-8af1-49d4-8211-a46121669d74', 'drs_path': 'v1_08755a99-c8f0-420a-9d21-4dca10b40ba5_8c7e4b65-5c04-4ccb-83da-c3dc1b6af4f2', 'version': '2021-05-24T16:20:18.221000Z', 'document_id': '9ff93ab7-a63a-4f07-88ea-8746dd0759e3', 'file_type': 'analysis_file', 'file_format': 'tar', 'content_description': ['Gene expression matrix'], 'is_intermediate': True, 'source': None, '_type': 'file', 'related_files': [], 'matrix_cell_count': None}], 'analysis_protocols': [{'workflow': ['alignment_and_transcript_counting_singlecells']}], 'imaging_protocols': [], 'library_preparation_protocols': [{'library_construction_approach': ["10X 3' v2 sequencing"], 'nucleic_acid_source': ['single cell']}], 'sequencing_protocols': [{'instrument_manufacturer_model': ['Illumina NextSeq 500'], 'paired_end': [False]}], 'sequencing_processes': [{'document_id': ['b34f5177-1cb1-4feb-a351-170f6cd4e60c']}], 'projects': [{'project_title': ['Integrated single cell analysis of blood and cerebrospinal fluid leukocytes in multiple sclerosis.'], 'project_short_name': ['SingleCellsMultipleSclerosis'], 'laboratory': ['Department of Electrical Engineering & Computer Science, Center for Computational Biology', 'Department of Neurology with Institute of Translational Neurology', 'Department of Physics', 'Institute of Neuropathology'], 'institutions': ['EMBL-EBI', 'University Hospital Münster', 'University of California'], 'document_id': ['d3ac7c1b-5302-4804-b611-dad9f89c049d'], 'publication_titles': ['Integrated single cell analysis of blood and cerebrospinal fluid leukocytes in multiple sclerosis.'], 'insdc_project_accessions': ['SRP223886', 'SRP235677'], 'geo_series_accessions': ['GSE138266', 'GSE141797'], 'array_express_accessions': [None], 'insdc_study_accessions': ['PRJNA575241'], 'supplementary_links': [None], '_type': ['project']}]}, 'num_contributions': 1, 'sources': [{'id': '08755a99-c8f0-420a-9d21-4dca10b40ba5', 'spec': 'tdr:broad-datarepo-terra-prod-hca2:snapshot/hca_prod_20201120_dcp2___20210707_dcp7:'}], 'bundles': [{'uuid': 'b34f5177-1cb1-4feb-a351-170f6cd4e60c', 'version': '2021-05-24T16:20:19.389000Z'}], 'total_estimated_cells': 77400}, 'cell_suspensions')
Traceback (most recent call last):
  File "/var/task/app.py", line 1562, in generate_manifest
    return app.manifest_controller.get_manifest(event)
  File "/var/task/azul/service/manifest_controller.py", line 67, in get_manifest
    result = self.service.get_manifest(format_=ManifestFormat(state['format_']),
  File "/var/task/azul/service/manifest_service.py", line 364, in get_manifest
    partition = generator.write(object_key, partition)
  File "/var/task/azul/service/manifest_service.py", line 967, in write
    file_path, base_name = self.create_file()
  File "/var/task/azul/service/manifest_service.py", line 1449, in create_file
    converter.add_doc(doc)
  File "/var/task/azul/service/avro_pfb.py", line 105, in add_doc
    assert False, (doc, entity_type)
hannes-ucsc commented 3 years ago

Ticket does not have instruction on how to reproduce. Please fix, @jessebrennan. For demo, attempt to reproduce issue.

jessebrennan commented 3 years ago

I will make a more refined filter to reproduce the issue on prod.