Open ESapenaVentura opened 3 years ago
@ESapenaVentura :
The metadata spreadsheet GSE134144_ontologies.xlsx
has passed all the ingest-graph-validator tests without any errors.
I am attaching links to two images. One graph is for the protocols and the other for the biomaterials.
@ami-day will be the secondary wrangler when she has capacity
A couple of notes:
@ESapenaVentura I have done the secondary review and made changes to a copy located here: https://docs.google.com/spreadsheets/d/1nVfbUfY_zzxZZHe4IuDb9OvZz6uhFXzP/edit#gid=385545271 I didn't make all the below changes as some require discussion.
In general the metadata looks very complete and accurate but here are some comments:
I saw you put female as the biological sex field for the female trans donors. I agree it it difficult to know what to put there but also agree female is probably best.
In terms of the biosample technical replicates: I think we need to be able to allow for more than one BioSamples accessions per cell suspension, for the purpose of this dataset and future datasets. Can we change the schema so that an array is accepted for this field before uploading this dataset?
Project - Contributors: I changed the format of author names with middle initials as it wasn't quite right.
Collection protocol: I added more information to the collection protocol description, taken from the publication.
Donor organism: the development stage for the 4 male samples is very specific (e.g. 7 year old human stage). I think it would be better here to add 'juvenile' stage for those donors. I'm thinking from the perspective of a user, broader categories such as infant, juvenile and adult would be useful for analyses involving multiple datasets (e.g. selecting datasets with the 'same' development stage give or take a few years). The authors define juvenile as ages 1-14 years.
Library preparation protocol: there is no information about the cell barcode and umi barcode. Should we add what is typical of the kit they used (10X 3' v2)?
Supplementary file: I added a description of the image files (what they show), taken from the publication figure.
Thanks @ami-day ! About the points:
Yeah, this definitely needs discussion. I am currently writing an email to the equity group to see if they can point me out to any guidelines
That is a great idea, we currently can't modify the schema (and with this even less because it will be a major update), but we can write a ticket to do it once we can evolve the schema
Thanks for changing the format! I didn't check it super thoroughly
Thanks for adding more info!
About the donor organism, I tend to input the most concrete term that we can. I agree that it's super specific, but the idea is that with the expansion of the ontology, anyone looking for juvenile samples should be able to get it
I can add that info for the library prep. I usually don't add it if it's 10x, but it doesn't hurt having it there
Thanks for adding another description!
I have updated the spreadsheet and the submission in ingest!
A little note: I have changed the image file content descriptions, as the content description is an ontologised term (I have moved the description you provided to the file description field)
Once we get some info about the correct way to represent the sex of the transfemale donors, it should be good to go!
Ok all sounds good!
I didn't think about being able to expand the ontology and I'm not sure exactly what you mean by this - do you mean, a user would check all the parent ontology terms for a given specific ontology ID? It does sound like extra work and I have never seen how this might be done during analyses e.g. if someone wanted to plot a graph or figure with development stage labels. I still think we should make the development stage more broad.
What I mean is that (once implemented), they can search for "juvenile" and these samples would appear because they are children of juvenile! but they will also appear if they do a more specific search (e.g, 7 year old human).
This is still far off from being implemented but I feel like we should act as if it is because curating after that feature is in the browser search will be a huge headache!
I am happy to further discuss this if necessary though
Yea that would be great if eventually it will be possible to search for projects in the data portal by an 'expanded' list related to an ontology search term. Hmm sounds complex to implement to me
Outcome of group discussion:
We should update our schema to include not only Biological Sex but also Gender. The project is stalled for now until we have the ability to make updates to the schema.
Biological Sex should be entered as the sex at birth and based on X/Y chromosomes (so in this case, the transfemale donor's sex would actually be male). Gender should be entered as what the donor has identified themselves as being, and can include transfemale, transmale, non-binary (exact terms to be listed when the metadata schema is updated).
In cases where a single sex cannot be identified at birth due to chromosome mosaicism, the biological sex should be 'mixed'.
If any treatment has been given, such as testosterone suppressants, these should be recorded using the relevant treatment field.
Enrique has contacted the equity group and will discuss these points with them when they get back to him.
This ticket has a dependency on https://github.com/HumanCellAtlas/metadata-schema/issues/1409
@ipediez put me in contact with someone from Prisma (A science LGBTQ+ association) to discuss about this.
We are currently discussing if the combination of Biological sex + Gender identity would accurately represent the spectrum of the human sexuality and gender identity
This dataset is suitable for SCEA. @ESapenaVentura is this still blocked and not yet submitted to HCA DCP?
This dataset is suitable for SCEA. @ESapenaVentura is this still blocked and not yet submitted to HCA DCP?
yes, we still don't have the schema update needed
Assigned E-HCAD-55
Need to ask Enrique if the final spreadsheet is available for curation and where it is.
This dataset is already in the SCEA Data Browser. Accession: E-GEOD-134144
waiting for #844 enrique will add the matrices to the project
Still waiting on the gender_ontology term
Primary Wrangler: Enrique Secondary Wrangler: Ami
Associated files:
Google Drive: https://drive.google.com/drive/folders/1YuUwBnrDvx9ofzh6QDHR5yCyXzqs1ggN
Published study links
Paper: https://www.cell.com/cell-stem-cell/fulltext/S1934-5909(19)30523-5
Accessioned data: GSE134144
Key Events