FDA-ARGOS / data.argosdb

MIT License
3 stars 7 forks source link

Mazumder - Provide a list of complete genome assembly and their corresponding SRR accession #195

Closed JingyueWu closed 1 year ago

JingyueWu commented 1 year ago

(Requested by Vahan for analysis in HIVE)

Collect a list of complete genome assembly (i.e. SARS-CoV-2 variant of concern such as alpha, beta, Gemma, delta, omicron) and their corresponding SRR accession ID's (no more than 20). Email Stephanie the list once done by the end of Friday (2/3).

HadleyKing commented 1 year ago

@JingyueWu Use this: complete_genome_assembly.txt

I made it from assemblyQC_ncbi in a few seconds.

steph-sing commented 1 year ago

@HadleyKing this is a separate task. @JingyueWu please disregard his comment.

@JingyueWu I am going to move this to the Developers Issues for Feb 2023. In the future please make sure to create tickets under the correct project. thanks

JingyueWu commented 1 year ago

@steph-sing

I only recorded a few on this spreadsheet: https://docs.google.com/spreadsheets/d/1viutOIEbvFwraZmiMJJyB0fmma68ObA3jxjlA3TRVBQ/edit?usp=sharing

Dilemma that I ran into while doing this task:

A) If I go with the complete genome assembly of SARS-CoV-2, this is the list I found: https://www.ncbi.nlm.nih.gov/data-hub/genome/?taxon=694009

As you can see, the latest submitted SARS-CoV-2 complete genome assemblies were in April, 2020. I could find some SRA that are associated with those SARS-CoV-2 complete genome assembly, but I couldn't find find the variant name (in general and in BioSample). Also, a lot of the assemblies do not have SRA.

B) If I go with SARS-CoV-2 variant of interest (i.e. alpha, beta, gemma, delta, and omicron), then this is the list I found: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Severe%20acute%20respiratory%20syndrome%20coronavirus%202,%20taxid:2697049&ProtNames_ss=surface%20glycoprotein&Completeness_s=complete&QualNum_i=0&Lineage_s=B.1.351

The problem with this is that none of them has complete genome assembly.

Please let me know which approach is more logical to take, thanks.

steph-sing commented 1 year ago

@JingyueWu yes - remember how we only really have one complete genome assembly for SARS-Cov-2 (wuhun) - and this is why we wanted to genreate more genome assemblies? If you can find them, then great, but if you can't then we need to start with SRA data (ngsQC - fastq files). I will take a look today, as I think your search approach can be adjusted. For now, please don't mention this in the Dev meeting. We can tell them that we are still compiling the list and I need to confirm the entries first. We should send the list out to them by tomorrow + load them into HIVE 3 and share the object IDs with Vahan.

steph-sing commented 1 year ago

assigning the ticket to myself, and giving corrective feedback via email.

steph-sing commented 1 year ago

completed by @penningtonea and @steph-sing - noted in GW Prod server