ebi-ait / hca-ebi-wrangler-central

This repo is for tracking work related to wrangling datasets for the HCA, associated tasks and for maintaining related documentation.
https://ebi-ait.github.io/hca-ebi-wrangler-central/
Apache License 2.0
7 stars 2 forks source link

GSE119212 - Decoding the development of human hippocampus #193

Open ESapenaVentura opened 3 years ago

ESapenaVentura commented 3 years ago

Primary Wrangler: Enrique Secondary Wrangler: Ami

Associated files:

Google Drive: https://drive.google.com/drive/folders/1N523DIJEXYvQwvCzqUEcGxcURot955h2

Published study links

Paper: https://www.nature.com/articles/s41586-019-1917-5

Accessioned data: GSE119212

Key Events

ESapenaVentura commented 3 years ago

Files are in BAM format, converting them to FASTQ using 10x's tool "BamToFastqConverter" in the EC2

ESapenaVentura commented 3 years ago

The test run for the conversion went alright (Although it generated too many fastq). Using a higher number of reads/file and re-launching for all the bam files!

ESapenaVentura commented 3 years ago

The conversion yielded valid R1 and R2 and, for some reason, empty I1 files - I am re-formatting the spreadsheet to not have the I1 files, as they are not needed (Files already demux)

ESapenaVentura commented 3 years ago

The files have been uploaded and they are valid

Submission here https://contribute.data.humancellatlas.org/submissions/detail?uuid=7d5c9287-cf4a-4948-b4d5-5c6123ad78ad&project=6c040a93-8cf8-4fd5-98de-2297eb07e9f6

I ran the ingest-graph-validator tool and it fails the check for the 5 degree minimum graph, which is expected because this dataset contains bulk data (directly from tissue to files)

ami-day commented 3 years ago

@ESapenaVentura I have reviewed this, great to see lots of cells and primary brain tissue in this project! I made some minor changes to the sheet linked above.

Apart from those changes, I would probably add more information about the collection protocol, to include the short paragraph about consent and that samples were derived from elective termination.

ESapenaVentura commented 3 years ago

Hi @ami-day , did you overwrite the spreadsheet? I can only see a "v3_noIndex", but I can't find the "v2_noIndex". It should be alright, I'm just curious!

Other than that, I think the changes are perfectly fine. I may have swapped them around because we have the pattern recognition wrong in our schema

About the addresses, I couldn't find the address in english. If you were able to, please let me know and I'll change it immediately!

About the elective termination, I can add that to the cause of death, but since we are treating the embryos as donors, the collection refers to the hippocampus collection, but not how it was "retrieved" from the mother.

I will make that last addition and, if you give me green lights, we can say this is ready for exporting!

ami-day commented 3 years ago

Hi @ESapenaVentura,

:)

ESapenaVentura commented 3 years ago

I added the consent in the collection protocol and revised the changes.

Ready to go!

ESapenaVentura commented 3 years ago

New submission: https://contribute.data.humancellatlas.org/submissions/detail?id=601970b2ac1792031eee0b88&project=6c040a93-8cf8-4fd5-98de-2297eb07e9f6

Files are validating but I think there is some trouble with validation

rays22 commented 3 years ago

This dataset has been exported to the Terra staging area.

ofanobilbao commented 2 years ago

@ESapenaVentura I moved this to Finished on the wrangling board as I believe this dataset is done from the DCP perspective right? Thanks!

ofanobilbao commented 2 years ago

@ESapenaVentura I just removed the dataset label. If/when we need updates on this we will need to add the label again so that it shows on the wrangling board. Does that sound ok?

ESapenaVentura commented 2 years ago

@ESapenaVentura To pick this one for the SCEA testing

ami-day commented 2 years ago

@ESapenaVentura did you start the SCEA conversion for this dataset?

ami-day commented 2 years ago

Is this the correct latest version? https://docs.google.com/spreadsheets/d/1ASSe6qt_sXWOdnCS6_1UJ2Hn-l2mMDUo/edit#gid=202266375 I will pre-convert it using the latest version of the hca2scea tool. I think the multi protocols might have caused an issue if you used a previous version of the tool.

ESapenaVentura commented 2 years ago

https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/230 E-HCAD43. Currently stalled due to a problem with file upload and de-prioritisation

ami-day commented 2 years ago

https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/230 E-HCAD43. Currently stalled due to a problem with file upload and de-prioritisation

Ok, I have just updated the sdrf file based on Anja's comments and re-uploaded the updated version to the gitlab branch. Now all that is left is that we need to transfer the fastq files to them via the EBI cluster (although not ideal, only way to do it). I have asked Anja where to send them on the cluster.