DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

A project in dcp7 catalog causes AssertionError #3238

Closed dsotirho-ucsc closed 3 years ago

dsotirho-ucsc commented 3 years ago

When sorted by projectTitle, the 56th project in the dcp7 catalog causes an AssertionError in contributor_matrices.py. This error can bee seen in the Data Browser on the Projects tab page 3. This error does not occur in the dcp6 catalog.

This URL works: https://service.azul.data.humancellatlas.org/index/projects?catalog=dcp7&size=55&sort=projectTitle&order=asc

This URL causes the error below: https://service.azul.data.humancellatlas.org/index/projects?catalog=dcp7&size=56&sort=projectTitle&order=asc

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1135, in _get_view_function_response
    response = view_function(**function_args)
  File "/var/task/app.py", line 1234, in get_project_data
    return repository_search('projects', project_id)
  File "/var/task/app.py", line 974, in repository_search
    return service.get_data(catalog=catalog,
  File "/var/task/azul/service/index_query_service.py", line 67, in get_data
    response = self.transform_request(catalog=catalog,
  File "/var/task/azul/service/elasticsearch_service.py", line 641, in transform_request
    final_response = FileSearchResponse(hits, paging, facets, entity_type, catalog)
  File "/var/task/azul/service/hca_response_v5.py", line 582, in __init__
    KeywordSearchResponse.__init__(self, hits, entity_type, catalog)
  File "/var/task/azul/service/hca_response_v5.py", line 474, in __init__
    class_entries = {'hits': [
  File "/var/task/azul/service/hca_response_v5.py", line 475, in <listcomp>
    self.map_entries(x) for x in hits], 'pagination': None}
  File "/var/task/azul/service/hca_response_v5.py", line 452, in map_entries
    projects=self.make_projects(entry),
  File "/var/task/azul/service/hca_response_v5.py", line 325, in make_projects
    translated_project['contributorMatrices'] = self.make_matrices_(contents['contributor_matrices'])
  File "/var/task/azul/service/hca_response_v5.py", line 334, in make_matrices_
    return make_stratification_tree(files)
  File "/var/task/azul/plugins/metadata/hca/contributor_matrices.py", line 325, in make_stratification_tree
    assert set(sorted_dimensions) == stratum.keys(), sorted_dimensions
AssertionError: ['organ', 'libraryConstructionApproach', 'genusSpecies', 'developmentStage']
dsotirho-ucsc commented 3 years ago

https://data.humancellatlas.org/explore/projects/dc1a41f6-9e09-42a6-959e-3be23db6da56

hannes-ucsc commented 3 years ago

Project aggregate:

{
    "_index": "azul_v2_prod_dcp7_projects_aggregate",
    "_type": "doc",
    "_id": "dc1a41f6-9e09-42a6-959e-3be23db6da56",
    "_score": null,
    "_source": {
        "entity_id": "dc1a41f6-9e09-42a6-959e-3be23db6da56",
        "contents": {
            "sample_specimens": [
                {
                    "has_input_biomaterial": [
                        "~null"
                    ],
                    "_source": [
                        "specimen_from_organism"
                    ],
                    "document_id": [
                        "038d4104-d94f-4e07-a9dc-aaa83573624d",
                        "0dbb46f2-badb-434a-ae10-86fd42fc38e4",
                        "3139c847-b1bc-48b5-ad25-ca2f1feea18e",
                        "43e53e3c-feef-457b-b2d1-6cc5c4b1609a",
                        "447bef09-4544-4fbf-808a-19a577e42541",
                        "45c940e9-54f3-43f8-9cb7-0fdea17ed03e",
                        "48e11413-9105-458b-8acd-aeaea48f784f",
                        "4a693422-6241-472c-810d-ec7dba9a8634",
                        "5608f2b1-d42f-4198-8d52-96e40e5239e4",
                        "57d15caa-696d-4b53-9e93-405006845ac3",
                        "5cbb063d-f70e-4a86-99c0-14bffdd8699f",
                        "5e9f5040-c83a-4442-8944-db144717107a",
                        "5f84da9f-23cb-4a1b-951c-e2f75eeb108a",
                        "6254fedf-0978-45ee-af82-e7f30ffebf0c",
                        "63170a26-263d-4207-8eab-871b687e234d",
                        "7bc6763d-2e27-4e5b-9e7d-f49021f25a00",
                        "87cc7f6c-019a-4669-8247-4529b75aebb9",
                        "95974991-4d62-43ac-b5c0-a89c2db04064",
                        "9b1c871e-20db-487f-9acc-36740b75877b",
                        "b34585bb-d351-463f-b2fb-e33180aef6a4",
                        "b5a382bd-34e5-429d-9013-3ff025ab322a",
                        "b5baf615-19f8-4a25-acc9-8dcef2ea62ed",
                        "b8f5887b-cd14-4c2c-80b8-d14a9eb73bd1",
                        "b9a0edf6-77bc-4e4a-a200-5cda4f902c04",
                        "c539551f-8056-48a0-8b77-e5d18c434559",
                        "c5494296-8411-4f42-aa4f-8b27d9c5fe83",
                        "c5afec47-bbca-481e-a7b4-413412242b70",
                        "d2e1395f-7a60-424a-8333-9b74978d2a23",
                        "e507570d-263a-4431-b170-eb98ed73f37c",
                        "e6b8f32e-7c8e-4308-b596-91006cde4b51",
                        "e8011c1d-8088-4923-8637-cc0d4ee7c38c",
                        "feb7c771-4b94-476b-a4b9-9fd459a09759"
                    ],
                    "biomaterial_id": [
                        "S01_cortex",
                        "S02_cortex",
                        "S03_cortex",
                        "S04_cortex",
                        "S05_cortex",
                        "S06_cortex",
                        "S07_cortex",
                        "S08_cortex",
                        "S09_cortex",
                        "S10_cortex",
                        "S11_cortex",
                        "S12_cortex",
                        "S13_cortex",
                        "S14_cortex",
                        "S15_cortex",
                        "S16_cortex",
                        "S17_cortex",
                        "S18_cortex",
                        "S19_cortex",
                        "S20_cortex",
                        "S21_cortex",
                        "S22_cortex",
                        "S23_cortex",
                        "S24_cortex",
                        "S25_cortex",
                        "S26_cortex",
                        "S27_cortex",
                        "S28_cortex",
                        "S29_cortex",
                        "S30_cortex",
                        "S31_cortex",
                        "S32_cortex"
                    ],
                    "disease": [
                        "normal",
                        "~null"
                    ],
                    "organ": [
                        "brain"
                    ],
                    "organ_part": [
                        "cerebral cortex",
                        "prefrontal cortex"
                    ],
                    "storage_method": [
                        "~null"
                    ],
                    "preservation_method": [
                        "~null"
                    ],
                    "_type": [
                        "specimen"
                    ]
                }
            ],
            "samples": [
                {
                    "document_id": [
                        "038d4104-d94f-4e07-a9dc-aaa83573624d",
                        "0dbb46f2-badb-434a-ae10-86fd42fc38e4",
                        "3139c847-b1bc-48b5-ad25-ca2f1feea18e",
                        "43e53e3c-feef-457b-b2d1-6cc5c4b1609a",
                        "447bef09-4544-4fbf-808a-19a577e42541",
                        "45c940e9-54f3-43f8-9cb7-0fdea17ed03e",
                        "48e11413-9105-458b-8acd-aeaea48f784f",
                        "4a693422-6241-472c-810d-ec7dba9a8634",
                        "5608f2b1-d42f-4198-8d52-96e40e5239e4",
                        "57d15caa-696d-4b53-9e93-405006845ac3",
                        "5cbb063d-f70e-4a86-99c0-14bffdd8699f",
                        "5e9f5040-c83a-4442-8944-db144717107a",
                        "5f84da9f-23cb-4a1b-951c-e2f75eeb108a",
                        "6254fedf-0978-45ee-af82-e7f30ffebf0c",
                        "63170a26-263d-4207-8eab-871b687e234d",
                        "7bc6763d-2e27-4e5b-9e7d-f49021f25a00",
                        "87cc7f6c-019a-4669-8247-4529b75aebb9",
                        "95974991-4d62-43ac-b5c0-a89c2db04064",
                        "9b1c871e-20db-487f-9acc-36740b75877b",
                        "b34585bb-d351-463f-b2fb-e33180aef6a4",
                        "b5a382bd-34e5-429d-9013-3ff025ab322a",
                        "b5baf615-19f8-4a25-acc9-8dcef2ea62ed",
                        "b8f5887b-cd14-4c2c-80b8-d14a9eb73bd1",
                        "b9a0edf6-77bc-4e4a-a200-5cda4f902c04",
                        "c539551f-8056-48a0-8b77-e5d18c434559",
                        "c5494296-8411-4f42-aa4f-8b27d9c5fe83",
                        "c5afec47-bbca-481e-a7b4-413412242b70",
                        "d2e1395f-7a60-424a-8333-9b74978d2a23",
                        "e507570d-263a-4431-b170-eb98ed73f37c",
                        "e6b8f32e-7c8e-4308-b596-91006cde4b51",
                        "e8011c1d-8088-4923-8637-cc0d4ee7c38c",
                        "feb7c771-4b94-476b-a4b9-9fd459a09759"
                    ],
                    "biomaterial_id": [
                        "S01_cortex",
                        "S02_cortex",
                        "S03_cortex",
                        "S04_cortex",
                        "S05_cortex",
                        "S06_cortex",
                        "S07_cortex",
                        "S08_cortex",
                        "S09_cortex",
                        "S10_cortex",
                        "S11_cortex",
                        "S12_cortex",
                        "S13_cortex",
                        "S14_cortex",
                        "S15_cortex",
                        "S16_cortex",
                        "S17_cortex",
                        "S18_cortex",
                        "S19_cortex",
                        "S20_cortex",
                        "S21_cortex",
                        "S22_cortex",
                        "S23_cortex",
                        "S24_cortex",
                        "S25_cortex",
                        "S26_cortex",
                        "S27_cortex",
                        "S28_cortex",
                        "S29_cortex",
                        "S30_cortex",
                        "S31_cortex",
                        "S32_cortex"
                    ],
                    "entity_type": [
                        "specimens"
                    ],
                    "organ": [
                        "brain"
                    ],
                    "organ_part": [
                        "cerebral cortex",
                        "prefrontal cortex"
                    ],
                    "model_organ": [
                        "~null"
                    ],
                    "model_organ_part": [
                        "~null"
                    ],
                    "effective_organ": [
                        "brain"
                    ]
                }
            ],
            "sequencing_inputs": [
                {
                    "document_id": [
                        "db4b5711-0abd-44cd-ac94-cdaca0cb1de3"
                    ],
                    "biomaterial_id": [
                        "exp_2_pool"
                    ],
                    "sequencing_input_type": [
                        "cell_suspension"
                    ]
                }
            ],
            "specimens": [
                {
                    "has_input_biomaterial": [
                        "~null"
                    ],
                    "_source": [
                        "specimen_from_organism"
                    ],
                    "document_id": [
                        "038d4104-d94f-4e07-a9dc-aaa83573624d",
                        "0dbb46f2-badb-434a-ae10-86fd42fc38e4",
                        "3139c847-b1bc-48b5-ad25-ca2f1feea18e",
                        "43e53e3c-feef-457b-b2d1-6cc5c4b1609a",
                        "447bef09-4544-4fbf-808a-19a577e42541",
                        "45c940e9-54f3-43f8-9cb7-0fdea17ed03e",
                        "48e11413-9105-458b-8acd-aeaea48f784f",
                        "4a693422-6241-472c-810d-ec7dba9a8634",
                        "5608f2b1-d42f-4198-8d52-96e40e5239e4",
                        "57d15caa-696d-4b53-9e93-405006845ac3",
                        "5cbb063d-f70e-4a86-99c0-14bffdd8699f",
                        "5e9f5040-c83a-4442-8944-db144717107a",
                        "5f84da9f-23cb-4a1b-951c-e2f75eeb108a",
                        "6254fedf-0978-45ee-af82-e7f30ffebf0c",
                        "63170a26-263d-4207-8eab-871b687e234d",
                        "7bc6763d-2e27-4e5b-9e7d-f49021f25a00",
                        "87cc7f6c-019a-4669-8247-4529b75aebb9",
                        "95974991-4d62-43ac-b5c0-a89c2db04064",
                        "9b1c871e-20db-487f-9acc-36740b75877b",
                        "b34585bb-d351-463f-b2fb-e33180aef6a4",
                        "b5a382bd-34e5-429d-9013-3ff025ab322a",
                        "b5baf615-19f8-4a25-acc9-8dcef2ea62ed",
                        "b8f5887b-cd14-4c2c-80b8-d14a9eb73bd1",
                        "b9a0edf6-77bc-4e4a-a200-5cda4f902c04",
                        "c539551f-8056-48a0-8b77-e5d18c434559",
                        "c5494296-8411-4f42-aa4f-8b27d9c5fe83",
                        "c5afec47-bbca-481e-a7b4-413412242b70",
                        "d2e1395f-7a60-424a-8333-9b74978d2a23",
                        "e507570d-263a-4431-b170-eb98ed73f37c",
                        "e6b8f32e-7c8e-4308-b596-91006cde4b51",
                        "e8011c1d-8088-4923-8637-cc0d4ee7c38c",
                        "feb7c771-4b94-476b-a4b9-9fd459a09759"
                    ],
                    "biomaterial_id": [
                        "S01_cortex",
                        "S02_cortex",
                        "S03_cortex",
                        "S04_cortex",
                        "S05_cortex",
                        "S06_cortex",
                        "S07_cortex",
                        "S08_cortex",
                        "S09_cortex",
                        "S10_cortex",
                        "S11_cortex",
                        "S12_cortex",
                        "S13_cortex",
                        "S14_cortex",
                        "S15_cortex",
                        "S16_cortex",
                        "S17_cortex",
                        "S18_cortex",
                        "S19_cortex",
                        "S20_cortex",
                        "S21_cortex",
                        "S22_cortex",
                        "S23_cortex",
                        "S24_cortex",
                        "S25_cortex",
                        "S26_cortex",
                        "S27_cortex",
                        "S28_cortex",
                        "S29_cortex",
                        "S30_cortex",
                        "S31_cortex",
                        "S32_cortex"
                    ],
                    "disease": [
                        "normal",
                        "~null"
                    ],
                    "organ": [
                        "brain"
                    ],
                    "organ_part": [
                        "cerebral cortex",
                        "prefrontal cortex"
                    ],
                    "storage_method": [
                        "~null"
                    ],
                    "preservation_method": [
                        "~null"
                    ],
                    "_type": [
                        "specimen"
                    ]
                }
            ],
            "cell_suspensions": [
                {
                    "document_id": [
                        "0735b137-8384-48f7-a236-5059d44390e9",
                        "2481931b-1a20-4bf4-a9bf-a88802adc686",
                        "5d147b60-3049-4288-a81b-a1ca31fa9bb4",
                        "6d5e15d7-7f8d-4560-b088-b3c149395ab7",
                        "ad66fe6c-68ed-4fae-bfa2-505319483405",
                        "c069319d-647d-415e-9b87-6aea8290fae3",
                        "db4b5711-0abd-44cd-ac94-cdaca0cb1de3",
                        "f71f15c7-ffbd-4e55-9b32-58e26da5fdbc"
                    ],
                    "biomaterial_id": [
                        "exp_1_hashed_pool",
                        "exp_1_nonhashed_pool",
                        "exp_2_pool",
                        "exp_3_pool",
                        "exp_4_1500_pool",
                        "exp_4_3000_pool",
                        "exp_4_4500_pool",
                        "exp_4_500_pool"
                    ],
                    "total_estimated_cells": 161000,
                    "total_estimated_cells_": 161000,
                    "selected_cell_type": [
                        "~null"
                    ],
                    "organ": [
                        "brain"
                    ],
                    "organ_part": [
                        "cerebral cortex",
                        "prefrontal cortex"
                    ]
                }
            ],
            "cell_lines": [],
            "donors": [
                {
                    "document_id": [
                        "014494a0-adde-4ac6-90c1-2b5824a8717c",
                        "03a287a8-a6e4-4f59-b56e-c40f407c34dc",
                        "0585547c-0a64-4caf-a00b-8d06f10161ea",
                        "0cabf344-2698-4a48-bb0a-3b18b13eef33",
                        "163b51b0-d5ea-45f6-b4f2-32b1541ad218",
                        "2005c9db-bccb-492c-ace7-4a8027d50cc4",
                        "3dd9459c-e52f-4726-8f15-eb960afeac38",
                        "4f6f3143-bd7b-466c-9e6c-9d1a0a58fca8",
                        "5307ac08-9f31-4893-94fb-3c24c494922c",
                        "65a51308-5c4c-41d0-90f5-2d81532b6f8d",
                        "6be2d813-ea47-4333-ab96-fa8e49c2a8de",
                        "9c2bf662-d7ec-4f91-8c25-99c02b49a5d2",
                        "b76ca148-346e-459e-b621-e3efdd78aefa",
                        "c10356c8-e081-4f97-8e85-0ea87b8eb84b",
                        "d54e74c6-804d-4512-a807-f76505701ce5",
                        "da5d5f8c-925a-4a7c-92d0-fdaba4c6ce62",
                        "ded068c8-9e53-4b82-b336-1b0897282b5f",
                        "e1dadf1c-58b8-4528-b281-440c9761e3c6",
                        "e3f953f1-d049-47ea-970e-0d7d9d9efb92",
                        "e728e55a-3d05-4256-89fe-2e53b6e6154e",
                        "e7409bfe-0bb3-4f79-81b5-dea6d9f71aba",
                        "e799c82a-6218-4a66-bd1b-6548c016118e"
                    ],
                    "biomaterial_id": [
                        "S01",
                        "S02",
                        "S03",
                        "S04",
                        "S05",
                        "S06",
                        "S07",
                        "S08",
                        "S21",
                        "S22",
                        "S23",
                        "S24",
                        "S25",
                        "S26",
                        "S27",
                        "S28",
                        "S29",
                        "S30",
                        "S31",
                        "S32",
                        "female_mouse",
                        "male_mouse"
                    ],
                    "biological_sex": [
                        "female",
                        "male"
                    ],
                    "genus_species": [
                        "Homo sapiens",
                        "Mus musculus"
                    ],
                    "development_stage": [
                        "human adult stage",
                        "post-juvenile adult stage"
                    ],
                    "diseases": [
                        "normal",
                        "~null"
                    ],
                    "organism_age": [
                        "76.3 year",
                        "78.7 year",
                        "80.9 year",
                        "83.7 year",
                        "84.7 year",
                        "85.2 year",
                        "85.4 year",
                        "85.8 year",
                        "85.9 year",
                        "86.5 year",
                        "86.7 year",
                        "89.3 year",
                        "90.7 year",
                        "91.2 year",
                        "92.2 year",
                        "92.3 year",
                        "93 year",
                        "94 year",
                        "95.2 year",
                        "96.5 year",
                        "~null"
                    ],
                    "organism_age_value": [
                        "76.3",
                        "78.7",
                        "80.9",
                        "83.7",
                        "84.7",
                        "85.2",
                        "85.4",
                        "85.8",
                        "85.9",
                        "86.5",
                        "86.7",
                        "89.3",
                        "90.7",
                        "91.2",
                        "92.2",
                        "92.3",
                        "93",
                        "94",
                        "95.2",
                        "96.5",
                        "~null"
                    ],
                    "organism_age_unit": [
                        "year",
                        "~null"
                    ],
                    "organism_age_range": [
                        {
                            "gte": 2.4061968E9,
                            "lte": 2.4061968E9
                        },
                        {
                            "gte": 2.4818832E9,
                            "lte": 2.4818832E9
                        },
                        {
                            "gte": 2.5512624E9,
                            "lte": 2.5512624E9
                        },
                        {
                            "gte": 2.6395632E9,
                            "lte": 2.6395632E9
                        },
                        {
                            "gte": 2.6710992E9,
                            "lte": 2.6710992E9
                        },
                        {
                            "gte": 2.6868672E9,
                            "lte": 2.6868672E9
                        },
                        {
                            "gte": 2.6931744E9,
                            "lte": 2.6931744E9
                        },
                        {
                            "gte": 2.7057888E9,
                            "lte": 2.7057888E9
                        },
                        {
                            "gte": 2.7089424E9,
                            "lte": 2.7089424E9
                        },
                        {
                            "gte": 2.727864E9,
                            "lte": 2.727864E9
                        },
                        {
                            "gte": 2.7341712E9,
                            "lte": 2.7341712E9
                        },
                        {
                            "gte": 2.8161648E9,
                            "lte": 2.8161648E9
                        },
                        {
                            "gte": 2.8603152E9,
                            "lte": 2.8603152E9
                        },
                        {
                            "gte": 2.8760832E9,
                            "lte": 2.8760832E9
                        },
                        {
                            "gte": 2.9076192E9,
                            "lte": 2.9076192E9
                        },
                        {
                            "gte": 2.9107728E9,
                            "lte": 2.9107728E9
                        },
                        {
                            "gte": 2.932848E9,
                            "lte": 2.932848E9
                        },
                        {
                            "gte": 2.964384E9,
                            "lte": 2.964384E9
                        },
                        {
                            "gte": 3.0022272E9,
                            "lte": 3.0022272E9
                        },
                        {
                            "gte": 3.043224E9,
                            "lte": 3.043224E9
                        }
                    ],
                    "donor_count": 22,
                    "donor_count_": 22
                }
            ],
            "organoids": [],
            "files": [
                {
                    "size": 4313841,
                    "size_": 4313841,
                    "file_format": "txt",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 3,
                    "content_description": [
                        "experimental metadata"
                    ],
                    "matrix_cell_count": 48114,
                    "matrix_cell_count_": 48114
                },
                {
                    "size": 1416931595,
                    "size_": 1416931595,
                    "file_format": "mtx",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 0,
                    "count": 4,
                    "content_description": [
                        "Gene expression matrix"
                    ],
                    "matrix_cell_count": 51516,
                    "matrix_cell_count_": 51516
                },
                {
                    "size": 2908310,
                    "size_": 2908310,
                    "file_format": "tsv",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 4,
                    "content_description": [
                        "Gene identifier"
                    ],
                    "matrix_cell_count": 9223372036854774784,
                    "matrix_cell_count_": null
                },
                {
                    "size": 1257986,
                    "size_": 1257986,
                    "file_format": "tsv",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 4,
                    "content_description": [
                        "barcodes file"
                    ],
                    "matrix_cell_count": 51516,
                    "matrix_cell_count_": 51516
                },
                {
                    "size": 3257439,
                    "size_": 3257439,
                    "file_format": "txt",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 4,
                    "content_description": [
                        "diffmap pca coordinates"
                    ],
                    "matrix_cell_count": 51516,
                    "matrix_cell_count_": 51516
                },
                {
                    "size": 3324343,
                    "size_": 3324343,
                    "file_format": "txt",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 4,
                    "content_description": [
                        "tsne coordinates"
                    ],
                    "matrix_cell_count": 51516,
                    "matrix_cell_count_": 51516
                },
                {
                    "size": 310034,
                    "size_": 310034,
                    "file_format": "txt",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 1,
                    "content_description": [
                        "umap coordinates"
                    ],
                    "matrix_cell_count": 6083,
                    "matrix_cell_count_": 6083
                },
                {
                    "size": 39026018887,
                    "size_": 39026018887,
                    "file_format": "fastq",
                    "source": [
                        "GEO"
                    ],
                    "is_intermediate": 9223372036854774784,
                    "count": 15,
                    "content_description": [
                        "DNA sequence"
                    ],
                    "matrix_cell_count": 9223372036854774784,
                    "matrix_cell_count_": null
                },
                {
                    "size": 264733,
                    "size_": 264733,
                    "file_format": "txt",
                    "source": [
                        "SCP"
                    ],
                    "is_intermediate": 0,
                    "count": 1,
                    "content_description": [
                        "expression matrix"
                    ],
                    "matrix_cell_count": 3402,
                    "matrix_cell_count_": 3402
                }
            ],
            "analysis_protocols": [
                {
                    "workflow": [
                        "snRNAseq_data_analysis"
                    ]
                }
            ],
            "imaging_protocols": [],
            "library_preparation_protocols": [
                {
                    "library_construction_approach": [
                        "10x v2 3'"
                    ],
                    "nucleic_acid_source": [
                        "single nucleus"
                    ]
                }
            ],
            "sequencing_protocols": [
                {
                    "instrument_manufacturer_model": [
                        "Illumina NextSeq 500"
                    ],
                    "paired_end": [
                        0
                    ]
                }
            ],
            "sequencing_processes": [
                {
                    "document_id": [
                        "02e69c25-71e2-48ca-a87b-e256938c6a98",
                        "39fdb0ae-9a00-4829-a581-e7cb59798f02"
                    ]
                }
            ],
            "matrices": [],
            "contributor_matrices": [
                {
                    "file": [
                        {
                            "uuid": "04f754f4-1c3c-4eac-a64f-c1a0937cae06",
                            "version": "2021-06-28T14:21:17.989000Z",
                            "name": "experiment2_mouse_pbs_scp_metadata.txt",
                            "size": 264733,
                            "matrix_cell_count": 3402,
                            "source": "SCP",
                            "strata": "genusSpecies=Mus musculus;developmentStage=post-juvenile adult stage;organ=brain;libraryConstructionApproach=10x v2 3'"
                        },
                        {
                            "uuid": "0dd9b6f6-c235-4200-84e1-292dfb14a293",
                            "version": "2021-06-28T14:21:17.662000Z",
                            "name": "experiment1_stonly_lowremove_scp_matrix.mtx",
                            "size": 194806185,
                            "matrix_cell_count": 6083,
                            "source": "SCP",
                            "strata": "genusSpecies=Homo sapiens;developmentStage=human adult stage;organ=brain"
                        },
                        {
                            "uuid": "9601aedb-6de9-4a32-8bd9-ed8b78b256e1",
                            "version": "2021-06-28T14:21:17.929000Z",
                            "name": "experiment2_mouse_pbs_scp_matrix.mtx",
                            "size": 87863503,
                            "matrix_cell_count": 3402,
                            "source": "SCP",
                            "strata": "genusSpecies=Mus musculus;developmentStage=post-juvenile adult stage;organ=brain;libraryConstructionApproach=10x v2 3'"
                        },
                        {
                            "uuid": "99f43823-a000-4a94-991a-9b4e37096956",
                            "version": "2021-06-28T14:21:17.843000Z",
                            "name": "experiment4_human_st_lowremove_scp_matrix.mtx",
                            "size": 1094598318,
                            "matrix_cell_count": 39086,
                            "source": "SCP",
                            "strata": "genusSpecies=Homo sapiens;developmentStage=human adult stage;organ=brain"
                        },
                        {
                            "uuid": "c7c99747-502a-4d0a-b5b8-c56533fb9eff",
                            "version": "2021-06-28T14:21:17.772000Z",
                            "name": "experiment3_human_mouse_pbs_clust_scp_matrix.mtx",
                            "size": 39663589,
                            "matrix_cell_count": 2945,
                            "source": "SCP",
                            "strata": "genusSpecies=Mus musculus,Homo sapiens;developmentStage=post-juvenile adult stage,human adult stage;organ=brain"
                        }
                    ]
                }
            ],
            "projects": [
                {
                    "project_title": "Nuclei multiplexing with barcoded antibodies for single-nucleus genomics.",
                    "project_description": "Single-nucleus RNA-seq (snRNA-seq) enables the interrogation of cellular states in complex tissues that are challenging to dissociate or are frozen, and opens the way to human genetics studies, clinical trials, and precise cell atlases of large organs. However, such applications are currently limited by batch effects, processing, and costs. Here, we present an approach for multiplexing snRNA-seq, using sample-barcoded antibodies to uniquely label nuclei from distinct samples. Comparing human brain cortex samples profiled with or without hashing antibodies, we demonstrate that nucleus hashing does not significantly alter recovered profiles. We develop DemuxEM, a computational tool that detects inter-sample multiplets and assigns singlets to their sample of origin, and validate its accuracy using sex-specific gene expression, species-mixing and natural genetic variation. Our approach will facilitate tissue atlases of isogenic model organisms or from multiple biopsies or longitudinal samples of one donor, and large-scale perturbation screens.",
                    "project_short_name": "NucleiMultiplexHashing",
                    "laboratory": [
                        "Center for Immunology and Inflammatory Diseases, Division of Rheumatology, Allergy, and Immunology",
                        "Center for Translational & Computational Neuroimmunology",
                        "Klarman Cell Observatory"
                    ],
                    "institutions": [
                        "BioLegend Inc.",
                        "Broad Institute of Harvard and MIT",
                        "Broad Institute of Harvard and MIT,",
                        "Columbia University Medical Center",
                        "EMBL-EBI",
                        "Massachusetts General Hospital and Harvard Medical School"
                    ],
                    "contact_names": [
                        "Abigail,Knecht",
                        "Aviv,Regev",
                        "Bertrand,Yeung",
                        "Bo,Li",
                        "Cristin,McCabe",
                        "Danielle,Dionne",
                        "Eugene,Drokhlyansky",
                        "Jellert,T,Gaublomme",
                        "Julia,Waldman",
                        "Lan,Nguyen",
                        "Naomi,Habib",
                        "Nicholas,Van Wittenberghe",
                        "Orit,Rozenblatt-Rosen",
                        "Philip,L,De Jager",
                        "Wei Kheng, Teh",
                        "Xinfang,Zhao",
                        "Yiming,Yang"
                    ],
                    "contributors": [
                        {
                            "contact_name": "Aviv,Regev",
                            "corresponding_contributor": 1,
                            "email": "aregev@broadinstitute.org",
                            "institution": "Broad Institute of Harvard and MIT,",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Nicholas,Van Wittenberghe",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Wei Kheng, Teh",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "wteh@ebi.ac.uk",
                            "institution": "EMBL-EBI",
                            "laboratory": "~null",
                            "project_role": "data curator"
                        },
                        {
                            "contact_name": "Eugene,Drokhlyansky",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Jellert,T,Gaublomme",
                            "corresponding_contributor": 1,
                            "email": "jellert.gaublomme@columbia.edu",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Danielle,Dionne",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Abigail,Knecht",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Orit,Rozenblatt-Rosen",
                            "corresponding_contributor": 1,
                            "email": "orit@broadinstitute.org",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Cristin,McCabe",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Yiming,Yang",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Massachusetts General Hospital and Harvard Medical School",
                            "laboratory": "Center for Immunology and Inflammatory Diseases, Division of Rheumatology, Allergy, and Immunology",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Xinfang,Zhao",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "BioLegend Inc.",
                            "laboratory": "",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Naomi,Habib",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Bertrand,Yeung",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "BioLegend Inc.",
                            "laboratory": "",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Bo,Li",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT,",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Julia,Waldman",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Philip,L,De Jager",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Columbia University Medical Center",
                            "laboratory": "Center for Translational & Computational Neuroimmunology",
                            "project_role": "~null"
                        },
                        {
                            "contact_name": "Lan,Nguyen",
                            "corresponding_contributor": 9223372036854774784,
                            "email": "~null",
                            "institution": "Broad Institute of Harvard and MIT",
                            "laboratory": "Klarman Cell Observatory",
                            "project_role": "~null"
                        }
                    ],
                    "document_id": "dc1a41f6-9e09-42a6-959e-3be23db6da56",
                    "publication_titles": [
                        "Nuclei multiplexing with barcoded antibodies for single-nucleus genomics."
                    ],
                    "publications": [
                        {
                            "publication_title": "Nuclei multiplexing with barcoded antibodies for single-nucleus genomics.",
                            "publication_url": "https://doi.org/10.1038/s41467-019-10756-2"
                        }
                    ],
                    "insdc_project_accessions": [
                        "SRP187985"
                    ],
                    "geo_series_accessions": [
                        "~null"
                    ],
                    "array_express_accessions": [
                        "~null"
                    ],
                    "insdc_study_accessions": [
                        "PRJNA526262"
                    ],
                    "supplementary_links": [
                        "https://portals.broadinstitute.org/single_cell/study/SCP371/experiment-1-all",
                        "https://portals.broadinstitute.org/single_cell/study/SCP375/experiment-1-stonly",
                        "https://portals.broadinstitute.org/single_cell/study/SCP379/experiment-3-human-mouse-pbs-clust",
                        "https://portals.broadinstitute.org/single_cell/study/SCP381/experiment-4-human-st",
                        "https://singlecell.broadinstitute.org/single_cell/study/SCP377",
                        "https://www.synapse.org/#!Synapse:syn22213200",
                        "https://www.synapse.org/#!Synapse:syn2580853/wiki/409840"
                    ],
                    "_type": "project"
                }
            ]
        },
        "num_contributions": 5,
        "sources": [
            {
                "id": "08755a99-c8f0-420a-9d21-4dca10b40ba5",
                "spec": "tdr:broad-datarepo-terra-prod-hca2:snapshot/hca_prod_20201120_dcp2___20210707_dcp7:"
            }
        ],
        "bundles": [
            {
                "uuid": "a0eafb51-718a-4284-8fe6-037a24833991",
                "version": "2021-06-28T14:21:18.716000Z"
            },
            {
                "uuid": "01135975-cb29-428b-99e9-b53631bc3b0d",
                "version": "2021-06-28T14:21:18.732000Z"
            },
            {
                "uuid": "02e69c25-71e2-48ca-a87b-e256938c6a98",
                "version": "2021-06-28T14:21:18.700000Z"
            },
            {
                "uuid": "39fdb0ae-9a00-4829-a581-e7cb59798f02",
                "version": "2021-06-28T14:21:18.691000Z"
            },
            {
                "uuid": "2a67ab7a-bb3c-468e-965b-e36b55d6feb6",
                "version": "2021-06-28T14:21:18.744000Z"
            }
        ],
        "total_estimated_cells": 161000
    },
    "sort": [
        "Nuclei multiplexing with barcoded antibodies for single-nucleus genomics."
    ]
}
hannes-ucsc commented 3 years ago

@danielsotirhos and I canned all five bundles and looked at the metadata. The Azul service complains about the fact that some CGMs aren't associated with a library construction method while other are. The relevant section of the index document is

"contributor_matrices": [
                {
                    "file": [
                        {
                            "uuid": "04f754f4-1c3c-4eac-a64f-c1a0937cae06",
                            "version": "2021-06-28T14:21:17.989000Z",
                            "name": "experiment2_mouse_pbs_scp_metadata.txt",
                            "size": 264733,
                            "matrix_cell_count": 3402,
                            "source": "SCP",
                            "strata": "genusSpecies=Mus musculus;developmentStage=post-juvenile adult stage;organ=brain;libraryConstructionApproach=10x v2 3'"
                        },
                        {
                            "uuid": "0dd9b6f6-c235-4200-84e1-292dfb14a293",
                            "version": "2021-06-28T14:21:17.662000Z",
                            "name": "experiment1_stonly_lowremove_scp_matrix.mtx",
                            "size": 194806185,
                            "matrix_cell_count": 6083,
                            "source": "SCP",
                            "strata": "genusSpecies=Homo sapiens;developmentStage=human adult stage;organ=brain"
                        },
                        {
                            "uuid": "9601aedb-6de9-4a32-8bd9-ed8b78b256e1",
                            "version": "2021-06-28T14:21:17.929000Z",
                            "name": "experiment2_mouse_pbs_scp_matrix.mtx",
                            "size": 87863503,
                            "matrix_cell_count": 3402,
                            "source": "SCP",
                            "strata": "genusSpecies=Mus musculus;developmentStage=post-juvenile adult stage;organ=brain;libraryConstructionApproach=10x v2 3'"
                        },
                        {
                            "uuid": "99f43823-a000-4a94-991a-9b4e37096956",
                            "version": "2021-06-28T14:21:17.843000Z",
                            "name": "experiment4_human_st_lowremove_scp_matrix.mtx",
                            "size": 1094598318,
                            "matrix_cell_count": 39086,
                            "source": "SCP",
                            "strata": "genusSpecies=Homo sapiens;developmentStage=human adult stage;organ=brain"
                        },
                        {
                            "uuid": "c7c99747-502a-4d0a-b5b8-c56533fb9eff",
                            "version": "2021-06-28T14:21:17.772000Z",
                            "name": "experiment3_human_mouse_pbs_clust_scp_matrix.mtx",
                            "size": 39663589,
                            "matrix_cell_count": 2945,
                            "source": "SCP",
                            "strata": "genusSpecies=Mus musculus,Homo sapiens;developmentStage=post-juvenile adult stage,human adult stage;organ=brain"
                        }
                    ]
                }
            ],

The metadata describes two types of experiments, one where the CGM analysis_file entity is derived from sequence_file entities that are produced by a sequencing process with library prep and sequencing protocols. Those CGM can be associated with a library preparation method. The second type of experiment links CGM analysis_file entity directly to cell_suspensions using an analysis_process without any sequencing_protocol or library_preparation_protocol that a library construction approach could be inferred from. Those CGMs are not associated with a library preparation method. The current implementation of Azul requires that each matrix is stratified by the same set of dimensions, so that all branches in the resulting stratification tree are of the same length. We will relax that requirement in a hotfix but it will create a tree with artificial "Unspecified" entries.

We will improve on this implementation in follow-up ticket #2443.

The wranglers might want to consider a different approach to describing these CGMs. For a cell in a suspension to end up in a matrix file, some kind of sequencing and library prep must have occurred and the corresponding method must be known.

hannes-ucsc commented 3 years ago

For demo, attempt to reproduce, both with the initial repro and the one mention in https://github.com/databiosphere/azul/issues/3238#issuecomment-880868394

clairerye commented 3 years ago

Thanks for the clear explanation here @hannes-ucsc. We will discuss between us and the UCSC wranglers how best to do this going forwards as I agree if cell suspension info is known library construction is also. We did (wrongly) assume that different length graphs were fine within a project and we would really like this feature. If I understand correctly, you will work to allow it in 2443. Thank you!