Open arschat opened 1 year ago
RDS analysis file, includes data from another publicly available dataset (here), but not wrangled in DCP & not eligible.
I will try to find metadata from specific donor and include in spreadsheet.
No metadata available for the extra donor. Used a dummy donor/specimen & CS for this.
There is ambiguity about the sequence machine used.
The constructed library was sequenced on a BGI MGISEQ-2000 or Illumina platform.
In GEO, only BGI is mentioned, however it is not in the EFO ontology for high throughput sequencers. It exists in the GENEPIO ontology http://purl.obolibrary.org/obo/GENEPIO_0100144.
An ontology request should be made.
(genes are in gene symbol format in the count matrices)
Ontology request needed, leave for next release.
Ontology request made but even if new term is added, OLS update should be made to proceed.
Ontology term added, will be available in EFO in next release.
BGI MGISEQ-2000 -> EFO:0700018
this dataset is potentially affected by the deletion of data from ncbi-cloud-data bucket - can you check @arschat ?
Files had already been uploaded in the hca-util area 9d41ab58-c57d-4804-926f-3d63275ed913
Will try to push project despite the ontology stalled.
Authors replied:
We use BGISEQ-500 to sew.
Therefore we should proceed with the other option.
Uploaded file with swapped the insdc_project_accession
and insdc_study_accession
but fix that with the following code through api:
from hca_ingest.api.ingestapi import IngestApi token = "
" api = IngestApi(url="https://api.ingest.archive.data.humancellatlas.org/") api.set_token(token) project_url = "https://api.ingest.archive.data.humancellatlas.org/projects/60068ceec9762f5f0de9f719" headers_json = {'Content-Type': 'application/json', 'Authorization': 'Bearer ' + token} results = api.get(project_url).json() results['content']['insdc_project_accessions'] = ['SRP281979'] results['content']['insdc_study_accessions'] = ['PRJNA662785'] api.put(project_url, headers=headers_json, json=results)
Seems correct in ingest now.
Waiting for files transfer and if graph valid, it will be ready for secondary review.
Some files in upload-area seems to be invalid, size for some fastq.gz files does not match the size mentioned in ncbi probably duplicated R1 to R2 of same library. I requested a new ncbi cloud transfer.
Downloaded SRA Lite fastqs instead of original files. If I redownload the correct files, I will surpass my monthly limit. Will ask other wrangler to download through their account these SRA files.
Thanks to Ida, correct files have been downloaded.
re-triggered validation using script here, seems stuck in same file (C51_R2.fastq.gz)
Deleted submission & re-submit.
Nice job! I have a few suggestions for information you can add
you can add the visualisation portal to the supplementary links
Approximately 20 ml of BALF was obtained and placed on ice. BALF was processed within 2 h and all operations were performed in a BSL-3 laboratory.
raw_matrix_generation
sample annotation
? maybe we need to request oneAbout fastq compressing. Data files were very large (~4 TB) and have been transferred through s3 buckets (ncbi to hca-util to upload-prod), and therefore in order to be compressed should be downloaded locally or on EC2, compressed and re-uploaded which would take a lot of time. Since there were also previous problems with the file validation, I didn't want to play too much with those. For those reasons I skipped the fastq compression.
All other changes were submitted (Donor timecourse -> Symptoms to Outcome/ Specimen timecourse -> Symptoms to Sampling date). Did not found other term for TCR files.
Submission is now on Submitted
state and I when it will be exported I will sent the import form.
Verified in browser, however, sequencing_protocol.instrument_manufacturer_model
needs update when a new OLS is available.
BGI MGISEQ-2000
-> EFO:0700018
Since ols update is completed, we can now add the correct sequencer information.
Project short name: Covid19BALFLandscape
Primary Wrangler:
Arsenios
Secondary Wrangler:
Ida
Associated files
Published study links
Paper: Single-cell landscape of bronchoalveolar immune cells in patients with COVID-19
Accessioned data: GSE145926
ingest: 08fb10df-32e5-456c-9882-e33fcd49077a
Key Events