Closed ami-day closed 3 years ago
Need to add bulk samples metadata.
Stalled as ingest can't currently accept spreadsheets with this many sequence file names.
@clairerye are you aare of this, is this still a problem?
I am not aware of this. Is it the file names or the number of files? @ami-day @aaclan-ebi are you aware of if we limit this somewhere? Is it possible to do it as two submissions or is that a terrible idea?
Ah, this might be the limit set in the no. of rows in the spreadsheet importer. We should be able to increase that, @ami-day how many sequencing files were there?
@clairerye and @lauraclarke I remember making a ticket for this in the painpoints column. It is the number of files (or rows) in the spreadsheet. It will fail to import if there are too many rows in the sequence file tab (where each row is each fastq file name)
@aaclan-ebi there are approx. 23,500 rows! Could we increase it to 50,000 in case this happens again with a larger dataset?
This is now unblocked as Jacob has made the necessary changes to ingest. It is showing as a valid spreadsheet in ingest staging (https://staging.contribute.data.humancellatlas.org/submissions/detail?id=60702e7bed34714563004d7a) and is ready for 2ndary review.
I think this won't make the April release because 1. it needs to be 2ndary reviewed and 2. there are more than 23,500 fastq files that need to be uploaded which i am guessing is going to take a while
I didn't think this was targetting release 5, it isn't associated with a milestone
Requesting the data files via NCBI cloud delivery. This actually significantly reduces the number of separate fastq files, as they have grouped all the run accessions into 1 experiment accession (in ENA there is a fastq file for each run and many runs per experiment (2931 experiments in total). If someone has the capacity review this dataset by early next week it might be good to add to release 5 if we can.
Have performed the secondary review - but was unable to complete the 'sequence files present in the s3 bucket' part of the secondary review. The sequence files tab is also still waiting for file names, as ami has mentioned above.
Project Tab: Project Title should be the published paper, not the preprint title? 'Oleic acid restores suppressive defects in tissue-resident FOXP3 Tregs from patients with multiple sclerosis'
Collection Protocol Tab: Collection Method Ontology ID EFO:0009121, 'blood draw’ I couldn’t find an appropriate ontology ID in HCAO for ‘aspirating adipose tissue’ so general collection probably works, unless we want to request for that ontology term.
Specimen from organism tab: Genus Species Ontology ID Organ Ontology ID - I’m not sure if ‘adipose tissue’ is the right term for the 'organ'. Looking at the hierarchy of ‘adipose tissue’ in the HCAO I would put ‘Connective Tissue’ or some other term, and ‘adipose tissue’ for the organ part - but this shouldn’t block the release.
Enrichment Protocol Tab Enrichment Method Ontology ID: EFO:0009112, ‘density gradient centrifugation’ EFO:0009109, 'magnetic affinity cell sorting' EFO:0009108, 'fluorescence-activated cell sorting'
Library Preparation Protocol Tab Maybe move the ‘RNeasy Micro Kit (QIAGEN)’ from the library_protocol_bulk description to the nucleic acid conversion kit column?
Sequence File Tab Content Description Ontology ID data:3494, ‘DNA sequence’
Apart from this, it looks great! Nice job separating the enrichment protocols, it's a large dataset that looks very interesting re: MS.
Thanks @Wkt8 . NCBI say that have completed my request to transfer like 30,000 fastq!! so will work on this now.
uploading the fastq to an hca-util upload area
fastqs are validating in ingest
Error syncing 2 files: https://github.com/ebi-ait/hca-ebi-wrangler-central/issues/316
syncing files to ingest
submitted the project in ingest
This has been exported on 24052021.
Thanks, could you please delete the duplicate of this project in ingest. There are currently 3 versions which is very confusing.
This has been approved by Anja.
Dataset/group this task is for:
GSE152543_OleicAcidMultipleSclerosis https://docs.google.com/spreadsheets/d/1_nM-tL6JyLTzhUKY9IxKWVRhPSofuBjl/edit#gid=1600387747
Wrangler responsible for this dataset/lab:
Primary: Ami Secondary: Wei
Acceptance criteria for the task: