HumanCellAtlas / data-operations

MIT License
0 stars 1 forks source link

Project Completion QA Checklist: Reprogrammed_Dendritic_Cells #4

Open jychien opened 5 years ago

jychien commented 5 years ago

Project UUID: 116965f3-f094-4769-9d28-ae675c1b569c Project Title: Single cell profiling of human induced dendritic cells generated by direct reprogramming of embryonic fibroblasts Project Short Name: Reprogrammed_Dendritic_Cells Submission UUID: 8c45a848-ab32-4928-8dd6-567b75eaf7e1 Environment: Production

"Reprogrammed_Dendritic_Cells" needs to be re-ingested to correct linking of specimen to cell suspension. Go through Project Completion QA Checklist prior to re-ingest into production.

jychien commented 5 years ago

Process QA found the following issue:

  1. There are 3 bundles associated with this project, 1 specimen and 2 cell lines. Each bundle should produce one empty_drops_result.csv from pipelines, but it is not displayed on browser with the correct associations. Linking issue with this project is outlined in AUDR ticket HumanCellAtlas/hca-data-wrangling/issues/366

image

  1. library_preparation_protocol.input_nucleic_acid_molecule.ontology_label: This is listed as 'messenger RNA'. Would be more consistent and specific if listed as 'polyA RNA'.

  2. library_preparation_protocol.strand: This is listed as 'second'. I am unclear as to what this field is referring to, but other 10x 3' v2 projects have the field as 'first'. Here is a diagram of 10x library prep.

Issues 2 & 3 have been added to HumanCellAtlas/hca-data-wrangling/issues/366

jahilton commented 5 years ago

tsv validator outcome...

  1. analysis_file.file_core.format:matrix not in ['bam', 'bai', 'csv', 'npy', 'npz', 'zarr']
  2. analysis_file.file_core.format:unknown not in ['bam', 'bai', 'csv', 'npy', 'npz', 'zarr']
  3. analysis_process.type.text:analysis entered but not analysis_process.type.ontology
  4. analysis_protocol.type.text:analysis entered but not analysis_protocol.type.ontology
  5. library_preparation_protocol.library_construction_method.ontology_label:10X v2 sequencing not in ["10X 5' v2 sequencing", "10x 3' v3 sequencing", "10X 3' v2 sequencing", "10X 3' v1 sequencing", 'Smart-seq', 'Smart-seq2', 'inDrop', 'Drop-seq', 'DroNc-seq', 'CITE-seq', 'MARS-seq']
  6. process.start_time:2018.07.31 does not match pattern
  7. process.start_time:2019.01.21 does not match pattern
  8. sequence_file.file_core.format:fastq.gz not in ['fastq']
  9. sequencing_protocol.10x.pooled_channels:4.0 does not match pattern
  10. specimen_from_organism.organ_parts.ontology:UBERON:0001003 is not a child of specimen_from_organism.organ.ontology:UBERON:0000922

1-4 are known issues across the board. Nothing AUDR-able at the moment. Similar situation with 8. https://github.com/HumanCellAtlas/hca-data-wrangling/issues/127 I have added the others to https://github.com/HumanCellAtlas/hca-data-wrangling/issues/366

jychien commented 4 years ago

Also noted: