ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

Tier 2 metadata example template #1209

Closed arschat closed 7 months ago

arschat commented 11 months ago

We got an email by Ellen Todres asking about the process on how we should define the Tier 2 metadata

We'd like to start Tier 2 metadata discussions and wrangling with Wave 2 bionetworks (Oral&Craniofacial; Skin; Liver; Pancreas; Heart; Genetic Diversity). Here's the plan. Please let me know if you think it needs to be adjusted:

  1. Your team produces some sort of a template for Tier 2 fields, ultimately including the currently existing ones that would be relevant to a dataset of any given bionetwork (perhaps, tonsils for Oral&Craniofacial, or something skin-specific for Skin etc).
  2. Then, we email these lists to the coordinators of each of the Wave 2 bionetworks to serve as examples, and ask to meet with them for an hour to discuss the process of Tier 2 metadata collection, and encourage them to add any other fields that are currently missing but are relevant (hopefully, in consultation with the members of their bionetworks).
  3. They send us an updated list for their bionetwork, and help contact data contributors, requesting to gather appropriate information.

Let me know what you think, and whether you can provide the bionetwork-specific examples. What would be the best format for these metadata fields lists? Excel spreadsheets?

We decided with Gabs that I will draft an email that will include metadata spreadsheets with two wrangled projects from the wave 2 bionetworks with rich metadata. These excel spreadsheets would also have to include all the DCP metadata fields.

After communicating with coordinators about these fields we would add requested metadata (bionetwork specific or not) they would like to include, and proceed with the required metadata-schema updates.

arschat commented 11 months ago

Chosen example projects:

arschat commented 11 months ago

Stripped down templates: SkinLymphomaRindler10x_template.xlsx humanHeartFailureCellularLandscape_template.xlsx

arschat commented 11 months ago

Removed project tabs too from templates xlsx files.

humanHeartFailureCellularLandscape_template.xlsx SkinLymphomaRindler10x_template.xlsx

arschat commented 11 months ago

Removed project tabs too from templates xlsx files.

humanHeartFailureCellularLandscape_template.xlsx SkinLymphomaRindler10x_template.xlsx

idazucchi commented 11 months ago

templates sent to the exec HCA office - they are the starting point for the tier 2 metadata discussion

arschat commented 11 months ago

Execs requested to add a tier 1 metadata row for all DCP fields that have mapping to Tier 1 fields.

idazucchi commented 11 months ago

Update metadata template to highlight fields that are already in Tier 1 --> send to exec office

arschat commented 11 months ago

Added tier 1 row. Coloring has been made based on the color scheme here. humanHeartFailureCellularLandscape_template.xlsx SkinLymphomaRindler10x_template.xlsx

Since project tabs have been removed the following Tier 1 fields were not filled.

title study_PI institute

There are the tier 1 fields with no DCP mapping, yet.

library_ID? library_ID_repository? sample_source? tissue_type library_preparation_batch library_sequencing_run sample_uniqueness gene_annotation_version

batch_condition default_embedding author_batch_notes comments