Closed JingyueWu closed 1 year ago
@JingyueWu Use this: complete_genome_assembly.txt
I made it from assemblyQC_ncbi in a few seconds.
@HadleyKing this is a separate task. @JingyueWu please disregard his comment.
@JingyueWu I am going to move this to the Developers Issues for Feb 2023. In the future please make sure to create tickets under the correct project. thanks
@steph-sing
I only recorded a few on this spreadsheet: https://docs.google.com/spreadsheets/d/1viutOIEbvFwraZmiMJJyB0fmma68ObA3jxjlA3TRVBQ/edit?usp=sharing
Dilemma that I ran into while doing this task:
A) If I go with the complete genome assembly of SARS-CoV-2, this is the list I found: https://www.ncbi.nlm.nih.gov/data-hub/genome/?taxon=694009
As you can see, the latest submitted SARS-CoV-2 complete genome assemblies were in April, 2020. I could find some SRA that are associated with those SARS-CoV-2 complete genome assembly, but I couldn't find find the variant name (in general and in BioSample). Also, a lot of the assemblies do not have SRA.
B) If I go with SARS-CoV-2 variant of interest (i.e. alpha, beta, gemma, delta, and omicron), then this is the list I found: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=Severe%20acute%20respiratory%20syndrome%20coronavirus%202,%20taxid:2697049&ProtNames_ss=surface%20glycoprotein&Completeness_s=complete&QualNum_i=0&Lineage_s=B.1.351
The problem with this is that none of them has complete genome assembly.
Please let me know which approach is more logical to take, thanks.
@JingyueWu yes - remember how we only really have one complete genome assembly for SARS-Cov-2 (wuhun) - and this is why we wanted to genreate more genome assemblies? If you can find them, then great, but if you can't then we need to start with SRA data (ngsQC - fastq files). I will take a look today, as I think your search approach can be adjusted. For now, please don't mention this in the Dev meeting. We can tell them that we are still compiling the list and I need to confirm the entries first. We should send the list out to them by tomorrow + load them into HIVE 3 and share the object IDs with Vahan.
assigning the ticket to myself, and giving corrective feedback via email.
completed by @penningtonea and @steph-sing - noted in GW Prod server
(Requested by Vahan for analysis in HIVE)
Collect a list of complete genome assembly (i.e. SARS-CoV-2 variant of concern such as alpha, beta, Gemma, delta, omicron) and their corresponding SRR accession ID's (no more than 20). Email Stephanie the list once done by the end of Friday (2/3).