HumanCellAtlas / metadata-schema

This repo is for the metadata schemas associated with the HCA
Apache License 2.0
65 stars 32 forks source link

List and prioritise next schema updates #1391

Open ESapenaVentura opened 3 years ago

ESapenaVentura commented 3 years ago

Description

As a DCP Metadata Team member, I want to list the next possible schema updates so we can discuss and prioritise the next iteration of PRs. ## Library preparation/technologies support - [x] Add permeabilisation time and permeabilisation time unit for Visium https://github.com/HumanCellAtlas/metadata-schema/issues/1471 - [ ] Support 10x V3 feature barcodes https://github.com/HumanCellAtlas/metadata-schema/issues/1125 - [ ] Improve the way we group sets of fastq files https://github.com/HumanCellAtlas/metadata-schema/issues/983 - [ ] Remove the requirement for barcode information if a tag-based method is selected. E.g. if an alternative technology like Microwell-Seq is used, and we are unable to get the barcode information (which read, barcode length, barcode offset), we still want to be able to submit the dataset with the expression matrix data - [ ] Add new fields to better support nuclei sequencing #1419 https://github.com/HumanCellAtlas/metadata-schema/issues/1420 https://github.com/HumanCellAtlas/metadata-schema/issues/1421 https://github.com/HumanCellAtlas/metadata-schema/issues/1422 https://github.com/HumanCellAtlas/metadata-schema/issues/1423 - [ ] Allow for an array of methods in the library_preparation_protocol.method field https://github.com/HumanCellAtlas/metadata-schema/issues/1330 - [ ] Allow for multiple barcode length/offset to support Microwell-Seq https://github.com/HumanCellAtlas/metadata-schema/issues/1403 - [x] Read3/Read4 added to `sequence_file.read_index` https://github.com/HumanCellAtlas/metadata-schema/issues/1401: Support library preparation technologies - [x] https://github.com/HumanCellAtlas/metadata-schema/issues/1407 - [x] https://github.com/HumanCellAtlas/metadata-schema/issues/1508 ## Improve biomaterial metadata - [x] Expand disease ontology to include phenotype: https://github.com/HumanCellAtlas/metadata-schema/issues/1460 - [ ] Gender identity: https://github.com/HumanCellAtlas/metadata-schema/issues/1409 - [ ] Donor organism genus to be required https://github.com/HumanCellAtlas/metadata-schema/issues/999 - [ ] Add tissue bank ID and accession to biomaterial_core https://github.com/HumanCellAtlas/metadata-schema/issues/1231 - [ ] Add `preservation_storage_module` to `cell_suspension` https://github.com/HumanCellAtlas/metadata-schema/issues/1213 - [ ] Add timecourse module to `specimen` https://github.com/HumanCellAtlas/metadata-schema/issues/1506 - [ ] Improve `organoid` metadata https://github.com/HumanCellAtlas/metadata-schema/issues/1334 - [ ] Ontologize cell_viability_method https://github.com/HumanCellAtlas/metadata-schema/issues/1201 - [ ] Add optional field `crown_rump_length` to donor organism https://github.com/HumanCellAtlas/metadata-schema/issues/1198 - [ ] Move to human module and ontologize `cause_of_death` field https://github.com/HumanCellAtlas/metadata-schema/issues/1188 - [ ] Change `disease` to `diseases` and make it an array in cell_line https://github.com/HumanCellAtlas/metadata-schema/issues/1162 - [ ] Remove growth_conditions from cell_suspension https://github.com/HumanCellAtlas/metadata-schema/issues/884 - [ ] Adding `drug use` field https://github.com/HumanCellAtlas/metadata-schema/issues/849 - [ ] Disallow unit without value or vice versa for organism age (I think this is extensible to other value/unit fields in the schema) https://github.com/HumanCellAtlas/metadata-schema/issues/1395 - [ ] Ontologise preservation_storage module `method` fields https://github.com/HumanCellAtlas/metadata-schema/issues/1463 ## Add enum values - [ ] Add `methanol fixation` to `preservation_storage.method` https://github.com/HumanCellAtlas/metadata-schema/issues/1203 ## Improve project metadata - [ ] Make `funders.organization` an array https://github.com/HumanCellAtlas/metadata-schema/issues/1132 - [ ] Make project_role an array https://github.com/HumanCellAtlas/metadata-schema/issues/1059 ## Patch schema fixes - [ ] Update specimen_from_organism regex pattern https://github.com/HumanCellAtlas/metadata-schema/issues/1393 - [ ] Update donor_organism keyword `example` to `examples` for age field https://github.com/HumanCellAtlas/metadata-schema/issues/1320 - [ ] Known disease to `Disease status` https://github.com/HumanCellAtlas/metadata-schema/issues/1124 - [ ] Change `project.funders` user-friendly name to "Funders" https://github.com/HumanCellAtlas/metadata-schema/issues/1103 - [ ] Update comment in `provenance.document_id` https://github.com/HumanCellAtlas/metadata-schema/issues/1044 - [ ] Gestational age typo https://github.com/HumanCellAtlas/metadata-schema/issues/1194 - [ ] Wrong apostrophe in matrix module https://github.com/HumanCellAtlas/metadata-schema/issues/1410 ## Supporting imaging data - [ ] Consistency between probe and channel fields https://github.com/HumanCellAtlas/metadata-schema/issues/1328 - [ ] Support imaging data https://github.com/HumanCellAtlas/metadata-schema/issues/974 - [ ] https://github.com/HumanCellAtlas/metadata-schema/issues/758 - [ ] Update imaging targets.json https://github.com/HumanCellAtlas/metadata-schema/issues/587 ## Review schema - [ ] Review all schema regexes https://github.com/HumanCellAtlas/metadata-schema/issues/1318 - [ ] Donor organism age has a broken regex https://github.com/HumanCellAtlas/metadata-schema/issues/1301 - [ ] Allow a range of BMI values for this field: donor_organism.human_specific.body_mass_index E.g. 23-28 - [ ] Add regex to avoid whitespaces in ID fields https://github.com/HumanCellAtlas/metadata-schema/issues/1248 - [ ] Update regex patterns to be consistent https://github.com/HumanCellAtlas/metadata-schema/issues/1193 - [ ] Update date-time regexes to capture partial dates according to RFC3339 https://github.com/HumanCellAtlas/metadata-schema/issues/1184 - [ ] Review and improve descriptions and examples across schema https://github.com/HumanCellAtlas/metadata-schema/issues/1187 - [ ] Remove obsolete example from schema https://github.com/HumanCellAtlas/metadata-schema/issues/1186 - [ ] Standardize use of booleans https://github.com/HumanCellAtlas/metadata-schema/issues/976 - [ ] Review all ontology schemas to make sure examples/restrictions are still up to date **TICKET NEEDED** - [ ] Add accessions module to make possible general accession input https://github.com/HumanCellAtlas/metadata-schema/issues/1408 ## Technical/system updates - [ ] Remove `schema_major_version` and `schema_minor_version`: https://github.com/HumanCellAtlas/metadata-schema/issues/1318 - [ ] Replace `reference_bundle` with `reference_files` in analysis process https://github.com/HumanCellAtlas/metadata-schema/issues/1290 - [ ] Split `memory` into `memory` and `memory_unit` https://github.com/HumanCellAtlas/metadata-schema/issues/1169 - [ ] lowercase `HDBR_accession` field https://github.com/HumanCellAtlas/metadata-schema/issues/1047 - [ ] Remove `checksum` field from file_core https://github.com/HumanCellAtlas/metadata-schema/issues/883 - [ ] File format should be required https://github.com/HumanCellAtlas/metadata-schema/issues/1459 ## Other updates - [ ] Update enrichment_protocol.markers to be an array https://github.com/HumanCellAtlas/metadata-schema/issues/1094 ## Needs clarification/discussion spatial technologies updates: - [ ] https://github.com/HumanCellAtlas/metadata-schema/issues/1405 - [ ] https://github.com/HumanCellAtlas/metadata-schema/issues/1406 - [x] Addition of a new "treatment_protocol" schema: https://github.com/HumanCellAtlas/metadata-schema/issues/1428 Other - [x] Include 'Information entity' branch as potential ontology terms for file_content_description https://github.com/HumanCellAtlas/metadata-schema/issues/1450 - [ ] Modelling genome ref files as analysis files https://github.com/HumanCellAtlas/metadata-schema/issues/1288 - [ ] Analysis protocol method should point to protocol method ontology https://github.com/HumanCellAtlas/metadata-schema/issues/1164 - [ ] Assemble organ list (New `system` field?) https://github.com/HumanCellAtlas/metadata-schema/issues/1163 - [ ] Allow for multiple machine names in sequencing_protocol https://github.com/HumanCellAtlas/metadata-schema/issues/1126 - [ ] Remove file_description from supplementary file https://github.com/HumanCellAtlas/metadata-schema/issues/1001 - [ ] Add specimen weight and size https://github.com/HumanCellAtlas/metadata-schema/issues/984 - [ ] Ontologize file_core.format field https://github.com/HumanCellAtlas/metadata-schema/issues/812 - [ ] Capture cell line name https://github.com/HumanCellAtlas/metadata-schema/issues/756 - [ ] Id fields scope https://github.com/HumanCellAtlas/metadata-schema/issues/733 - [ ] FACS module? https://github.com/HumanCellAtlas/metadata-schema/issues/619 ## Acceptance Criteria
rays22 commented 3 years ago

The list of possible updates above looks sensible to me. I would suggest that we pick those updates first that block data flow as top priority. The current order of the list might already reflect that, but I am not sure.