Open ESapenaVentura opened 3 years ago
Files are in BAM format, converting them to FASTQ using 10x's tool "BamToFastqConverter" in the EC2
The test run for the conversion went alright (Although it generated too many fastq). Using a higher number of reads/file and re-launching for all the bam files!
The conversion yielded valid R1 and R2 and, for some reason, empty I1 files - I am re-formatting the spreadsheet to not have the I1 files, as they are not needed (Files already demux)
The files have been uploaded and they are valid
I ran the ingest-graph-validator tool and it fails the check for the 5 degree minimum graph, which is expected because this dataset contains bulk data (directly from tissue to files)
@ESapenaVentura I have reviewed this, great to see lots of cells and primary brain tissue in this project! I made some minor changes to the sheet linked above.
I added the fastq creation method to the sequencing protocol tab.
1 of the donor organism's name and description was inaccurate/duplicated; I changed it from GW25 to GW27.
Some of the addresses in the Contributors tab are in Chinese, and some in English. Is this intended?
I changed both the project title and description slightly because the title did not mention scRNA-Seq (only ATAC-Seq) and the description started what seemed like mid sentence within an abstract. I think this is likely the script automatically extracted the appropriate xml field but the field itself was inaccurate/incomplete.
I switched the SRA study accession and Project accessions because I think they weren't in the correct columns (the column name is misleading so this happens to me often). I'm sorry if I'm incorrect and it causes an import error!
Apart from those changes, I would probably add more information about the collection protocol, to include the short paragraph about consent and that samples were derived from elective termination.
Hi @ami-day , did you overwrite the spreadsheet? I can only see a "v3_noIndex", but I can't find the "v2_noIndex". It should be alright, I'm just curious!
Other than that, I think the changes are perfectly fine. I may have swapped them around because we have the pattern recognition wrong in our schema
About the addresses, I couldn't find the address in english. If you were able to, please let me know and I'll change it immediately!
About the elective termination, I can add that to the cause of death, but since we are treating the embryos as donors, the collection refers to the hippocampus collection, but not how it was "retrieved" from the mother.
I will make that last addition and, if you give me green lights, we can say this is ready for exporting!
Hi @ESapenaVentura,
Yes I made changes in the original spreadsheet because they were very minor changes to resolve errors.
About the addresses: I usually google search the university/institute names separately to find the addresses if they are not available. If you have done this and were still unable to find them in English, I think it's ok to keep the Chinese.
The other additions you made sounds good! I think this is ready for upload to ingest prod. now!
:)
I added the consent in the collection protocol and revised the changes.
Ready to go!
New submission: https://contribute.data.humancellatlas.org/submissions/detail?id=601970b2ac1792031eee0b88&project=6c040a93-8cf8-4fd5-98de-2297eb07e9f6
Files are validating but I think there is some trouble with validation
This dataset has been exported to the Terra staging area.
@ESapenaVentura I moved this to Finished on the wrangling board as I believe this dataset is done from the DCP perspective right? Thanks!
@ESapenaVentura I just removed the dataset label. If/when we need updates on this we will need to add the label again so that it shows on the wrangling board. Does that sound ok?
@ESapenaVentura To pick this one for the SCEA testing
@ESapenaVentura did you start the SCEA conversion for this dataset?
Is this the correct latest version? https://docs.google.com/spreadsheets/d/1ASSe6qt_sXWOdnCS6_1UJ2Hn-l2mMDUo/edit#gid=202266375 I will pre-convert it using the latest version of the hca2scea tool. I think the multi protocols might have caused an issue if you used a previous version of the tool.
https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/230 E-HCAD43. Currently stalled due to a problem with file upload and de-prioritisation
https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/230 E-HCAD43. Currently stalled due to a problem with file upload and de-prioritisation
Ok, I have just updated the sdrf file based on Anja's comments and re-uploaded the updated version to the gitlab branch. Now all that is left is that we need to transfer the fastq files to them via the EBI cluster (although not ideal, only way to do it). I have asked Anja where to send them on the cluster.
Primary Wrangler: Enrique Secondary Wrangler: Ami
Associated files:
Google Drive: https://drive.google.com/drive/folders/1N523DIJEXYvQwvCzqUEcGxcURot955h2
Published study links
Paper: https://www.nature.com/articles/s41586-019-1917-5
Accessioned data: GSE119212
Key Events