Open mshadbolt opened 4 years ago
Previous ticket is here: https://github.com/HumanCellAtlas/hca-data-wrangling/issues/411
Current status is that I sent largely filled out spreadsheet for review today as well as instructions for data upload.
I am hoping to proceed with archiving in EBI archives as soon as data is reviewed.
sent nudge email
emailed Paul to see if he has any advice on how to progress this.
received updated spreadsheet and should be able to now finish the submission, hoping to get accessions by the early next week
Secondary review done - A couple of comments: Donor organism
Other than that LGTM! I have checked linking from specimen to donor and from donor to cell suspension, from cell_suspension to file it's kinda hard because the IDs for most files don't match but I trust they are ok. I have also checked the library preps and it looks fine!
I changed the cannibis to cannabis. I didn't change the smoker status.
I have uploaded all the files that I have to the submission here: https://ui.ingest.archive.data.humancellatlas.org/submissions/detail?id=5f19a1fcfe9c934c8b83515f&project=ad98d3cd-26fb-4ee3-99c9-8a2ab085e737
There are 75 remaining files to be uploaded to the upload area and then transferred to the submission upload area.
I have provided @ESapenaVentura with the relevant information to be able to progress the submission through archiving tomorrow while I am away, presuming that the rest of the files are uploaded.
Just 3 files left that we are waiting to get uploaded. 765/768 files valid against the submission.
I emailed Carlos to let them know in case they didnt realise and let Enrique know the last remaining files that will need to be transferred.
Had to re-submit because there was a problem with the lane_indexes for the following files:
HCAHeart7664652_S1_L001_I1_001.fastq.gz
HCAHeart7698015_S1_L001_I1_001.fastq.gz
HCAHeart7664652_S1_L001_R1_001.fastq.gz
HCAHeart7698015_S1_L001_R1_001.fastq.gz
HCAHeart7664652_S1_L001_R2_001.fastq.gz
HCAHeart7698015_S1_L001_R2_001.fastq.gz
HCAHeart7664653_S1_L001_I1_001.fastq.gz
HCAHeart7702873_S1_L001_I1_001.fastq.gz
HCAHeart7664653_S1_L001_R1_001.fastq.gz
HCAHeart7702873_S1_L001_R1_001.fastq.gz
HCAHeart7664653_S1_L001_R2_001.fastq.gz
HCAHeart7702873_S1_L001_R2_001.fastq.gz
HCAHeart7757637_S1_L001_I1_001.fastq.gz
HCAHeart7985087_S1_L001_I1_001.fastq.gz
HCAHeart7757637_S1_L001_R1_001.fastq.gz
HCAHeart7985087_S1_L001_R1_001.fastq.gz
HCAHeart7757637_S1_L001_R2_001.fastq.gz
HCAHeart7985087_S1_L001_R2_001.fastq.gz
HCAHeart7702876_S1_L001_I1_001.fastq.gz
HCAHeart7702877_S1_L001_I1_001.fastq.gz
HCAHeart7702876_S1_L001_R1_001.fastq.gz
HCAHeart7702877_S1_L001_R1_001.fastq.gz
HCAHeart7702876_S1_L001_R2_001.fastq.gz
HCAHeart7702877_S1_L001_R2_001.fastq.gz
HCAHeart7757638_S1_L001_I1_001.fastq.gz
HCAHeart7985088_S1_L001_I1_001.fastq.gz
HCAHeart7757638_S1_L001_R1_001.fastq.gz
HCAHeart7985088_S1_L001_R1_001.fastq.gz
HCAHeart7757638_S1_L001_R2_001.fastq.gz
HCAHeart7985088_S1_L001_R2_001.fastq.gz
HCAHeart7829976_S1_L001_I1_001.fastq.gz
HCAHeart7985089_S1_L001_I1_001.fastq.gz
HCAHeart7829976_S1_L001_R1_001.fastq.gz
HCAHeart7985089_S1_L001_R1_001.fastq.gz
HCAHeart7829976_S1_L001_R2_001.fastq.gz
HCAHeart7985089_S1_L001_R2_001.fastq.gz
HCAHeart7664654_S1_L001_I1_001.fastq.gz
HCAHeart7757636_S1_L001_I1_001.fastq.gz
HCAHeart7985086_S1_L001_I1_001.fastq.gz
HCAHeart7664654_S1_L001_R1_001.fastq.gz
HCAHeart7757636_S1_L001_R1_001.fastq.gz
HCAHeart7985086_S1_L001_R1_001.fastq.gz
HCAHeart7664654_S1_L001_R2_001.fastq.gz
HCAHeart7757636_S1_L001_R2_001.fastq.gz
HCAHeart7985086_S1_L001_R2_001.fastq.gz
HCAHeart7702874_S1_L001_I1_001.fastq.gz
HCAHeart7702875_S1_L001_I1_001.fastq.gz
HCAHeart7702874_S1_L001_R1_001.fastq.gz
HCAHeart7702875_S1_L001_R1_001.fastq.gz
HCAHeart7702874_S1_L001_R2_001.fastq.gz
HCAHeart7702875_S1_L001_R2_001.fastq.gz
HCAHeart7702878_S1_L001_I1_001.fastq.gz
HCAHeart7702879_S1_L001_I1_001.fastq.gz
HCAHeart7702878_S1_L001_R1_001.fastq.gz
HCAHeart7702879_S1_L001_R1_001.fastq.gz
HCAHeart7702878_S1_L001_R2_001.fastq.gz
HCAHeart7702879_S1_L001_R2_001.fastq.gz
HCAHeart7702881_S1_L001_I1_001.fastq.gz
HCAHeart7702882_S1_L001_I1_001.fastq.gz
HCAHeart7702881_S1_L001_R1_001.fastq.gz
HCAHeart7702882_S1_L001_R1_001.fastq.gz
HCAHeart7702881_S1_L001_R2_001.fastq.gz
HCAHeart7702882_S1_L001_R2_001.fastq.gz
HCAHeart7656539_S1_L001_I1_001.fastq.gz
HCAHeart7702880_S1_L001_I1_001.fastq.gz
HCAHeart7656539_S1_L001_R1_001.fastq.gz
HCAHeart7702880_S1_L001_R1_001.fastq.gz
HCAHeart7656539_S1_L001_R2_001.fastq.gz
HCAHeart7702880_S1_L001_R2_001.fastq.gz
There were more than 1 set per library prep with the same lane_index. I have to investigate why did the ingest-graph-validator not pick this up.
Also project - contributor names was incorrectly formatted, causing a problem with the archiver. Corrrected that as well.
New submission: https://ui.ingest.archive.data.humancellatlas.org/submissions/detail?id=5f1b0e98fe9c934c8b835c80&project=ad98d3cd-26fb-4ee3-99c9-8a2ab085e737 New spreadsheet: Same folder, same name with an added suffix.
Currently files are being uploaded. Once uploaded I will set them all to "valid" to avoid waiting time. They have been already validated once in the previous submission and will be validated again in the bam conversion jobs and when uploaded to ENA, so it's fair to say this is pretty safe.
New DSP submission: 67c34d20-eaf5-4aa8-bfc8-31dd4e97829f
Currently getting a "500 internal server error" when trying to retrieve entities. Will try again later
Scratch that, new dsp submission: 6d20bce5-86fb-4a52-bd54-63838cab18a9
Just ran the file archiver on the EBI cluster. I needed to create a new folder under /hca/ because the one @mshadbolt created doesn't have write permissions for other users (Same folder name + "_enrique", happy to rename it afterwards)
256 jobs were sent (256 * 3 = 768 files), and so far some are already running and not finishing immediately, so it's looking good!
All the scripts used to parallelise the file upload are in the same folder.
Little update:
The DSP submission contains the following submittables:
Project has been archived under ENA accessions ERP123138 - PRJEB39602
Hasn't yet been exported.
@ESapenaVentura I have made a new spreadsheet in the brokering folder here: https://drive.google.com/open?id=1rIxeMhRg7jUFrf4a3dVqj-JTux0ZUuQP If you can do a review that would be amazing. thanks!
Main changes are:
_1
_2
etc@mshadbolt Done! Everything looks alright, I have just added the PMID for the publication (32971526)
Everything else looks fine!
oh thanks! I literally looked yesterday and couldnt find it, oops! thanks for the review
I believe I have fixed the experiment/run linking in the ENA submission but need to give the database some time to reindex so want to check in tomorrow. the basic process was:
I had made a start on converting this project to SCEA under accession E-HCAD-28 https://drive.google.com/open?id=1tNTsxbeepQh7F4ZF3B13vvghbR09D3Gp
I think there were multiple issues with the conversion that I raised with @ami-day because of some of the assumptions in the code around where various accessions get put from the geo-to-hca script, whereas this was manually curated from a contributor. I am not 100% sure if those things have been changed in the converter so it would be worth trying to run the conversion again, or if it will just require a lot of manual curation to ensure the correct accessions are put in the columns that the SCEA curators like to see them.
Taking this on
@ami-day I can't tell by scrolling on the ticket where this dataset is on the DCP journey. I believe it was submitted to DCP. But I don't know if it should be on the Finished column or where. Could you, please, move where appropriate? Thanks! Apart from needing to be brokered to SCEA, does it need any updates or archiving? Thanks!
It really looks like finished so moving it there.
From the comments it looks like this should be in 'broker to SCEA' as it's already been archived with fastq files in ENA (ERP123138) and is live.
Assigned E-HCAD id: E-HCAD-47
E-HCAD43 already exists please use the next one! https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/merge_requests/230 it's stuck due to de-prioritisation and a problem with file upload
E-HCAD43
Ok I have assigned it E-HCAD-47,E-HCAD-48,E-HCAD-49 (split by library prep. method type). It definitely needs checking and potentially merging/correcting. The files can be found here: https://drive.google.com/drive/folders/1tNTsxbeepQh7F4ZF3B13vvghbR09D3Gp
In review by SCEA team and handed over to SCEA team (Gitlab).
Primary Wrangler: Marion Shadbolt
Secondary Wrangler: Enrique
Associated files:
Google Drive: https://drive.google.com/open?id=1gnB0anWGADLlwhGHLBQb3AowIpB8145p&authuser=mshadbolt@ebi.ac.uk&usp=drive_fs
Key Events
[ ] Receive project questionnaire and move project from potential to in_progress projects