Open ami-day opened 3 years ago
@ami-day this didn't have the dataset tag so I don't think anyone had seen it needed secondary review
@ami-day this didn't have the dataset tag so I don't think anyone had seen it needed secondary review
ohhh, I see. I posted it on the slack channel, but I should have labelled it too. I think I'll get back to them before review, as it's been a while now, and I think the review won't be particularly speedy - bits of information in different places.
@Wkt8 @ESapenaVentura can either of you take on this for secondary review?
Actually, ignore me, I forgot this was being left till contributor feedback!
Stalled. Waiting for Anna/Jeongbin to respond. I emailed them to remind them today.
It looks fine other than the issues below.
Could you update the publication details, because the pre-print is now published in Nature Genetics doi: 10.1038/s41588-021-00801-6 ?
One of the Dissociation protocols is not linked to any biomaterials. It is reported as an orphan entity by the ingest grap validator:
'protocol_core.protocol_name': 'centrifugation_after_differentiation_organoids'
'donor_organism.is_living' : yes
Are these European donors? Are there any GDPR concerns? I am not sure if the 'iPSC lines derived from healthy donors' should be a problem for loading raw sequence data into the HCA Data Portal or not.
Thanks @rays22 , I have made the changes. The organoid data protocols were orphaned because I had asked the authors whether they should be included (I couldn't see any organoid samples in the info they sent me). They have been terrible at getting back so I have removed those protocols.
I have asked Tony about the gdpr question
Tony says it is fine to upload the raw data. So will start on that.
the data is transferring a to hca-util area
I am having problems with the data download from ENA. It looks to be an issue on the ENA server side. I have message Eugene to ask about this.
Requested NCBI cloud delivery of data. It can only be delivered in SRA object format so I will need to convert it to fastq.
Uploading the fastq to an hca-util upload area from local folder on EC2.
Alegria is downloading the fastq from ENA (about 800-900 files) and then uploading to an hca-util upload area. The project submission is here: https://contribute.data.humancellatlas.org/submissions/detail?uuid=15142b86-b5b3-49cb-bad0-cb3eb8ba0a79&project=72c636f3-d51f-4e5d-9cf8-9b91427a9e0c the metadata is valid except that the fastq files need to be uploaded to ingest.
Alegria is still downloading the fastqs from ENA, but I also noticed some discrepancies with the accession numbers and the fastq files.
Discrepancy 1: Cell Suspension ID: SAME6833352 with BioSamples ID: SAME6833352
In the spreadsheet, this entity is linked to: ERR4699951_1.fastq.gz ERR4699952_1.fastq.gz ERR4699953_1.fastq.gz ERR4699954_1.fastq.gz
However, on the ENA Browser (https://www.ebi.ac.uk/ena/browser/view/PRJEB38269) Those four fastq files should be linked to: SAMEA6833353 instead.
Discrepancy 2 Additionally, the project on the ENA Browser contains 532 sample accessions, for a total of 1064 fastq files, but there are only 968 metadata sequence file entities in the spreadsheet.
As such, we weren't sure if we should:
Due to this and the time constraints in downloading the files we have decided to wait for release 8 for export.
Files are now in this hca-util upload area: s3://hca-util-upload-area/1268551e-f2d0-43eb-9511-968e46901e72/ as mentioned in https://app.zenhub.com/workspaces/operations-5fa2d8f2df78bb000f7fb2b5/issues/ebi-ait/hca-ebi-dev-team/432
@amnonkhen unfortunately I need to review this dataset and potentially make edits before it gets exported, so can we please move the milestone from July to August as I won't be back in time to make the updates?
Working on getting the missing fastq files. Removing the July milestone as it won't be done by then, changing it to the August milestone.
Emailed Oli about the missing samples on 02/08/2021. Some samples in the ENA study are missing from the dataset files they provided us with and some samples in the files are missing from ENA.
Decided to go ahead and upload this dataset to the September release milestone using the fastqs available in ENA given no response from the authors. Currently re-uploading ~80 fastq files which were displayed as invalid in ingest prod.
@ami-day I've removed the extra files for this submission: https://contribute.data.humancellatlas.org/submissions/detail?uuid=d3dc95e5-7154-4d2a-b684-81a852cfb9d9
Thanks @aaclan-ebi , it's now exporting :)
Successfully exported and just needs the import form which ami is completing now.
Curating to SCEA format. Assigning it E-HCAD-50
Moving this to stalled as I'm not sure if this experimental design is suitable for scea. Have messaged them and waiting for a reply.
Pre-converted the files and uploaded them here: https://drive.google.com/drive/folders/14m-j87nBFQUi3yCrIDhTrQ54KHoCLCe2
Re-assigned the HCAD-id to E-HCAD-51.
Uploaded the idf and sdrf files: https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/296
E-HCAD-58 Generated MAGET-TAB files Currently validating
In review by SCEA team and handed over to SCEA team (Gitlab).
Dataset/group this task is for:
Oliver Stegle's group DopaminergicNeuronDifferentiation Latest sheet: https://docs.google.com/spreadsheets/d/1Eblt1hHBAwk84BFiuTzbANGcJUYdtj5k/edit#gid=56358489 Folder: https://drive.google.com/drive/folders/1kRNGIsBsIviLHEPv1ipkLG0lrM8TLLbL
https://contribute.data.humancellatlas.org/submissions/detail?id=60a253c5901b6d17e5f3a4f0
Wrangler responsible for this dataset/lab:
[x] Primary: Ami
[x] Secondary: Ray
Since this is a contributor dataset, there might be some contributor and/or publication metadata missing, I am yet to send it back to them to check it all looks ok to them. Want to get secondary review first.
Description of the task: