HumanCellAtlas / data-store

Design specs and prototypes for the HCA Data Storage System (DSS, "blue box")
https://dss.staging.data.humancellatlas.org/
Other
40 stars 6 forks source link

KeyError: `schema_version` in get_shape_descriptor #915

Closed hannes-ucsc closed 6 years ago

hannes-ucsc commented 6 years ago
[WARNING]   2018-01-24T01:11:15.783Z    f302aa2f-00a2-11e8-9622-f5b041420ba0    An exception occurred in 'index' of {'key': 'bundles/ce0ce65d-3822-4cc5-b647-179bc5bae135.2017-12-08T154119.501223Z', 'indexer': AWSIndexer(dryrun=False, notify=None), 'bundle': BundleDocument(replica=Replica.aws, fqid=ce0ce65d-3822-4cc5-b647-179bc5bae135.2017-12-08T154119.501223Z, {'manifest': {'format': '0.0.1', 'version': '2017-12-08T154119.501223Z', 'files': [{'name': 'project.json', 'uuid': 'a2b4d96d-b471-4916-80cf-f1b57e8635b7', 'version': '2017-12-08T154112.003258Z', 'content-type': 'application/json; dcp-type="metadata/project"', 'size': 3393, 'indexed': True, 'crc32c': '4963a8b8', 's3-etag': 'e2f3207ae15acd3335cd35df8a3759a6', 'sha1': 'a3fe570e8c0b185a2d2402d8909800c2dfe88d09', 'sha256': '661a81b83504157b0138f455b82d28c8a7cf82800a42ad1c6b5062e01c05b432'}, {'name': 'sample.json', 'uuid': 'de25b0bd-4759-42cc-8d7a-df49a7b1a279', 'version': '2017-12-08T154114.103009Z', 'content-type': 'application/json; dcp-type="metadata/sample"', 'size': 2334, 'indexed': True, 'crc32c': 'c0d1866f', 's3-etag': 'a56ec84e69f704f5ce5446441f53e6c2', 'sha1': '72d3e021f608bacd6c52830c3a77b3970d222359', 'sha256': '64ccdd9eb0e09fd4c4dbb5ec12fd4a9ca25cc0d4172416aad9f6736ded4b20ea'}, {'name': 'AZ_A1.fastq.gz', 'uuid': 'c851dc1f-337d-4f23-8e37-38e6508ab6e0', 'version': '2017-12-08T154116.979765Z', 'content-type': 'binary/octet-stream', 'size': 125191, 'indexed': False, 'crc32c': '4ef74578', 's3-etag': 'c7bbee4c46bbf29432862e05830c8f39', 'sha1': '17f8b4be0cc6e8281a402bb365b1283b458906a3', 'sha256': 'fe6d4fdfea2ff1df97500dcfe7085ac3abfb760026bff75a34c20fb97a4b2b29'}, {'name': 'assay.json', 'uuid': '424b2d05-b638-4393-bcf1-e9a27e09bbe4', 'version': '2017-12-08T154118.100837Z', 'content-type': 'application/json; dcp-type="metadata/assay"', 'size': 926, 'indexed': True, 'crc32c': '4a29fcb5', 's3-etag': 'c01e480c7bb623470e216a9d473e7c7a', 'sha1': 'a402fc8ea2495a9651b997603d4e4485b71f3dfd', 'sha256': '7f078f2c06520da23055df4cc3739af6cb19498fac35ebc0d1a92ca180ec8047'}], 'creator_uid': 8008}, 'files': {'project_json': {'content': {'core': {'type': 'project', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/project.json'}, 'name': 'Single-cell RNA-seq analysis of human pancreas from healthy individuals and type 2 diabetes patients', 'contributors': [{'city': 'Stockholm', 'name': 'Rickard,,Sandberg', 'country': 'Sweden', 'institution': 'Department of Cell and Molecular Biology (CMB), Karolinska Institutet, Stockholm, Sweden. Ludwig Institute for Cancer Research, Stockholm, Sweden. Integrated Cardiometabolic Center (ICMC), Karolinska Institutet, Stockholm, Sweden.', 'address': 'Nobels vag 3, 171 77', 'email': 'Rickard.Sandberg@ki.se'}, {'city': 'Stockholm', 'name': 'Asa,,Segerstolpe', 'country': 'Sweden', 'institution': 'Department of Cell and Molecular Biology (CMB), Karolinska Institutet, Stockholm, Sweden. Integrated Cardiometabolic Center (ICMC), Karolinska Institutet, Stockholm, Sweden.', 'address': 'Nobels vag 3, 171 77', 'email': 'Asa.Segerstolpe@ki.se'}], 'submitters': [{'city': 'Stockholm', 'name': 'Athanasia,,Palasantza', 'country': 'Sweden', 'email': 'Athanasia.Palasantza@ki.se', 'phone': '0046 8 5248 3986', 'address': 'Nobels vag 3, 171 77', 'institution': 'Department of Cell and Molecular Biology (CMB), Karolinska Institutet, Stockholm, Sweden.'}], 'insdc_project': 'ERP017126', 'experimental_design': [{'text': 'cell type comparison design', 'ontology': 'OBI:0001411'}, {'text': 'disease state design', 'ontology': 'OBI:0001293'}], 'publications': [{'authors': ['Segerstolpe A, Palasantza A, Eliasson P, Andersson E, Andreasson A, Sun X, Picelli S, Sabirsh A, Clausen M, Bjursell MK, Smith DM, Kasper M, Ammala C, Sandberg R'], 'doi': '10.1016/j.cmet.2016.08.020', 'title': 'Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes'}], 'project_id': 'HCA-demo-project 6', 'insdc_study': 'PRJEB15401', 'description': 'We used single-cell RNA-sequencing to generate transcriptional profiles of endocrine and exocrine cell types of the human pancreas. Pancreatic tissue and islets were obtained from six healthy and four T2D cadaveric donors. Islets were cultured and dissociated into single-cell suspension. Viable individual cells were distributed via fluorescence-activated cell sorted (FACS) into 384-well plates containing lysis buffer. Single-cell cDNA libraries were generated using the Smart-seq2 protocol. Gene expression was quantified as reads per kilobase transcript and per million mapped reads (RPKM) using rpkmforgenes. Bioinformatics analysis was used to classify cells into cell types without knowledge of cell types or prior purification of cell populations. We revealed subpopulations in endocrine and exocrine cell types, identified genes with interesting correlations to body mass index (BMI) in specific cell types and found transcriptional alterations in T2D.  Complementary whole-islet RNA-seq data have also been deposited at ArrayExpress under accession number E-MTAB-5060 (http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-5060).'}, 'core': {'type': 'project_bundle', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/project_bundle.json'}, 'hca_ingest': {'accession': '', 'submissionDate': '2017-12-08T15:30:37.219Z', 'updateDate': '2017-12-08T15:31:18.454Z', 'document_id': '8a10094f-02a2-41ac-a1ef-51123e8980ed'}}, 'sample_json': {'core': {'type': 'sample_bundle', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/sample_bundle.json'}, 'samples': [{'content': {'core': {'type': 'sample', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/sample.json'}, 'name': 'pancreas islet normal', 'specimen_from_organism': {'body_part': {'text': 'islet of Langerhans', 'ontology': 'UBERON:0000006'}, 'organ': {'ontololgy': 'UBERON:0001264', 'text': 'pancreas'}}, 'genus_species': {'text': 'Homo sapiens'}, 'supplementary_files': ['E-MTAB-5061.processed.1.zip'], 'ncbi_taxon_id': 9606, 'derived_from': 'AZ', 'sample_id': 'AZ_A1', 'sample_accessions': {'insdc_sample': 'ERS1348470'}}, 'derivation_protocols': [{'core': {'type': 'protocol', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/protocol.json'}, 'type': {'text': 'sequencing protocol'}, 'description': 'A protocol to test protocol ingest', 'name': 'Test protocol', 'protocol_id': 'A123'}, {'core': {'type': 'protocol', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/protocol.json'}, 'type': {'text': 'cell suspension protocol'}, 'description': 'Another test protocol', 'name': 'Test protocol 2', 'protocol_id': 'A456'}], 'hca_ingest': {'accession': '', 'submissionDate': '2017-12-08T15:30:37.264Z', 'updateDate': '2017-12-08T15:31:19.355Z', 'document_id': '29e841b6-f962-4cb2-86b2-297359a4a934'}, 'derived_from': '270f56b6-46dd-45a3-a0b5-d75210e01a7a'}, {'content': {'core': {'type': 'sample', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/sample.json'}, 'donor': {'is_living': True, 'death': {'cause_of_death': {'text': 'death by natural cause', 'ontology': 'NCIt:C82465'}, 'time_of_death': '2017-11-05T10:20:00Z'}, 'age': '43', 'life_stage': 'adult', 'disease': [{'text': 'normal', 'ontology': 'PATO:0000461'}], 'age_unit': 'year', 'sex': 'male', 'body_mass_index': 30.8}, 'ncbi_taxon_id': 9606, 'genus_species': {'text': 'Homo sapiens', 'ontology': 'NCBITaxon:9606'}, 'sample_id': 'AZ'}, 'hca_ingest': {'accession': '', 'submissionDate': '2017-12-08T15:30:37.246Z', 'updateDate': '2017-12-08T15:31:18.539Z', 'document_id': '270f56b6-46dd-45a3-a0b5-d75210e01a7a'}}]}, 'assay_json': {'content': {'single_cell': {'cell_handling': 'FACS'}, 'core': {'type': 'assay', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/assay.json'}, 'rna': {'spike_in_dilution': 40000, 'spike_in': 'ERCC', 'end_bias': 'none', 'primer': 'poly-dT', 'strand': 'both', 'library_construction': 'smart-seq2'}, 'assay_id': 'ERR1630013', 'seq': {'paired_ends': False, 'lanes': [{'r1': 'AZ_A1.fastq.gz'}], 'instrument_platform': 'Illumina', 'molecule': 'polyA RNA', 'instrument_model': 'HiSeq 2000', 'insdc_run': ['ERR1630013'], 'insdc_experiment': 'ERX1700346'}}, 'core': {'type': 'assay_bundle', 'schema_url': 'https://raw.githubusercontent.com/HumanCellAtlas/metadata-schema/4.3.0/json_schema/assay_bundle.json'}, 'hca_ingest': {'accession': '', 'submissionDate': '2017-12-08T15:30:37.433Z', 'updateDate': '2017-12-08T15:31:18.506Z', 'document_id': 'c03b9bd6-b8b4-4e64-945b-93a12298a37a'}}}, 'state': 'new', 'uuid': 'ce0ce65d-3822-4cc5-b647-179bc5bae135'})}
Traceback (most recent call last):
  File "/var/task/domovoilib/dss/util/retry.py", line 203, in wrapper
    return f(*args, **kwargs)
  File "/var/task/domovoilib/dss/storage/index_document.py", line 136, in index
    index_name = self._prepare_index(dryrun)
  File "/var/task/domovoilib/dss/storage/index_document.py", line 258, in _prepare_index
    shape_descriptor = self.get_shape_descriptor()
  File "/var/task/domovoilib/dss/storage/index_document.py", line 294, in get_shape_descriptor
    schema_version = core['schema_version']
KeyError: 'schema_version'
[WARNING]   2018-01-24T01:11:15.785Z    f302aa2f-00a2-11e8-9622-f5b041420ba0    An exception occurred in 'index_object' of {'key': 'bundles/ce0ce65d-3822-4cc5-b647-179bc5bae135.2017-12-08T154119.501223Z', 'indexer': AWSIndexer(dryrun=False, notify=None)}
Traceback (most recent call last):
  File "/var/task/domovoilib/dss/util/retry.py", line 203, in wrapper
    return f(*args, **kwargs)
  File "/var/task/domovoilib/dss/events/handlers/index.py", line 47, in index_object
    self._index_bundle(self.replica, identifier, logger)
  File "/var/task/domovoilib/dss/events/handlers/index.py", line 63, in _index_bundle
    modified, index_name = doc.index(dryrun=self.dryrun)
  File "/var/task/domovoilib/dss/util/retry.py", line 203, in wrapper
    return f(*args, **kwargs)
  File "/var/task/domovoilib/dss/storage/index_document.py", line 136, in index
    index_name = self._prepare_index(dryrun)
  File "/var/task/domovoilib/dss/storage/index_document.py", line 258, in _prepare_index
    shape_descriptor = self.get_shape_descriptor()
  File "/var/task/domovoilib/dss/storage/index_document.py", line 294, in get_shape_descriptor
    schema_version = core['schema_version']
KeyError: 'schema_version'
[WARNING]   2018-01-24T01:11:15.785Z    f302aa2f-00a2-11e8-9622-f5b041420ba0    Reindex operation failed for bundles/ce0ce65d-3822-4cc5-b647-179bc5bae135.2017-12-08T154119.501223Z
Traceback (most recent call last):
  File "/var/task/domovoilib/dss/stepfunctions/visitation/reindex.py", line 45, in process_item
    self.indexer.index_object(key, self.logger)
  File "/var/task/domovoilib/dss/util/retry.py", line 203, in wrapper
    return f(*args, **kwargs)
  File "/var/task/domovoilib/dss/events/handlers/index.py", line 47, in index_object
    self._index_bundle(self.replica, identifier, logger)
  File "/var/task/domovoilib/dss/events/handlers/index.py", line 63, in _index_bundle
    modified, index_name = doc.index(dryrun=self.dryrun)
  File "/var/task/domovoilib/dss/util/retry.py", line 203, in wrapper
    return f(*args, **kwargs)
  File "/var/task/domovoilib/dss/storage/index_document.py", line 136, in index
    index_name = self._prepare_index(dryrun)
  File "/var/task/domovoilib/dss/storage/index_document.py", line 258, in _prepare_index
    shape_descriptor = self.get_shape_descriptor()
  File "/var/task/domovoilib/dss/storage/index_document.py", line 294, in get_shape_descriptor
    schema_version = core['schema_version']
KeyError: 'schema_version'
Bento007 commented 6 years ago

Fixed