FDA-ARGOS / data.argosdb

MIT License
3 stars 7 forks source link

Pond - Ebola Sudan Fasta #113

Closed steph-sing closed 1 year ago

steph-sing commented 1 year ago

Choose any of these SRRs from the FDA Argos Project for Ebola sudan (attached). Complete reference guided assembly, whole genome + BCO:

EbolaSudan_FDABioProject.csv

REF: Sudan ebolavirus, Strain: Gulu – Complete Genome – 2004

https://www.ncbi.nlm.nih.gov/assembly/GCF_000855585.1/ GenBank assembly accession: GCA_000855585.1 RefSeq assembly accession: GCF_000855585.1 BioProject: PRJNA485481 GenBank sequence: AY729654.1 NCBI Reference Sequence: NC_006432.1

stephenshank commented 1 year ago

My preliminary comments are as follows:

Picked two SRA accessions at random. Reference-guided approach was not promising, perhaps because the reference was out of date and there is a long time variability in the dataset (at least 1970s-2010s). De novo appeared more promising...found one contig that was about the length of an Ebola genome. It aligns nicely with the Genbank sequence that corresponds to that SRA.

Right now I can deliver an associated BCO up to the contig file. The contig was picked out manually and will take a non-negligible amount of work to automate.

This is a fantastic opportunity to benchmark assembly approaches. Future work can include:

and more. Further expertise in assembly is appreciated. Questions/comments are welcome.

rajamazumder commented 1 year ago

Great! Send the fasta sequence for the assembly. Even incomplete BCO is fine for now. Are you using megahit or metagenomicSPAdes?

On Wed, Oct 26, 2022, 12:56 PM Stephen Shank @.***> wrote:

My preliminary comments are as follows:

Picked two SRA accessions at random. Reference-guided approach was not promising, perhaps because the reference was out of date and there is a long time variability in the dataset (at least 1970s-2010s). De novo appeared more promising...found one contig https://data.hyphy.org/web/argos/ebola-reassembly/contig.fasta that was about the length of an Ebola genome. It aligns nicely https://data.hyphy.org/web/argos/ebola-reassembly/aligned.fasta with the Genbank sequence https://www.ncbi.nlm.nih.gov/nuccore/MH121161.1/ that corresponds to that SRA.

Right now I can deliver an associated BCO up to the contig file. The contig was picked out manually and will take a non-negligible amount of work to automate.

This is a fantastic opportunity to benchmark assembly approaches. Future work can include:

  • explore different references (easy)
  • explore other accessions (easy)
  • flashy viz with our alignment viewer (medium): http://alignment.hyphy.org/sam-scaffold
  • development functionality in Galaxy for mapping to multiple references in batches (medium)
  • automate contig picking (hard)
  • extend to viruses with multiple segments (hard)
  • deliver full BCO (hard)

and more. Further expertise in assembly is appreciated. Questions/comments are welcome.

— Reply to this email directly, view it on GitHub https://github.com/FDA-ARGOS/data.argosdb/issues/113#issuecomment-1292335946, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GISI6LLZI2YURXVC6QZ3WFFPDPANCNFSM6AAAAAARNHLWJE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

stephenshank commented 1 year ago

FASTA is here: https://data.hyphy.org/web/argos/ebola-reassembly/contig.fasta

stephenshank commented 1 year ago

BCO up to contig is here: https://data.hyphy.org/web/argos/ebola-reassembly/bco_bf60fd5f5f7f44bf.json

rajamazumder commented 1 year ago

Jingyue or Emily, can you please add these to data.argosdb.org and send me the link?

On Wed, Oct 26, 2022, 1:17 PM Stephen Shank @.***> wrote:

BCO up to contig is here: https://data.hyphy.org/web/argos/ebola-reassembly/bco_bf60fd5f5f7f44bf.json

— Reply to this email directly, view it on GitHub https://github.com/FDA-ARGOS/data.argosdb/issues/113#issuecomment-1292360831, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GISJTC6YOJWZ2XXLKZIDWFFRTJANCNFSM6AAAAAARNHLWJE . You are receiving this because you commented.Message ID: <FDA-ARGOS/data. @.***>

HadleyKing commented 1 year ago

https://biocomputeobject.org/builder/https/biocomputeobject.org/ARG_000002/DRAFT

HadleyKing commented 1 year ago

@JingyueWu

steph-sing commented 1 year ago

@rajamazumder We are working out a process. I will email you details.

stephenshank commented 1 year ago

Mostly for my own usage... Galaxy/BCO integration, @HadleyKing feel free to add/modify: Usability - annotate workflow and history Contributors - add to a workflow (ORCID) Description domain/steps - toolshed/(test toolshed?) entry

stephenshank commented 1 year ago

I queued up my (de novo) assembly pipeline for all 98 accessions in about 1 minute last night without writing a single line of code (what I will refer to henceforth as "no code", and can be performed by undergraduates on up). I copied and pasted accessions from the above CSV (thank you for that!) into Galaxy and executed the associated workflow. This results in a single BCO and 98 contig files, one for each SRA accession. It completed overnight and has an associated history and an incomplete assemblyQC table.

There are three non-negligible efforts that would make this work more complete:

Happy to discuss further tomorrow.

stephenshank commented 1 year ago

@rajamazumder Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

rajamazumder commented 1 year ago

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://github.com/rajamazumder Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1618-7 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://github.com/FDA-ARGOS/data.argosdb/issues/113#issuecomment-1299089602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE . You are receiving this because you were mentioned.Message ID: @.***>

steph-sing commented 1 year ago

Yes - @rajamazumder I have been watching all of our task and conversations. I will send the ngs read to Thomas, but Stephen already has what we want him to have, and we still have not finish this initial task, so I will not pile more on to @stephenshank - Also Thomas is not in this project so he can't see what you are messaging. @JingyueWu is following up on all the tasks to complete this.

steph-sing commented 1 year ago

@stephenshank

Argos_FDA_BioProject_List.xlsx

Two tabs in this excel File: Tab 1 (ArgosFDABioProject) - all entries in the BioProject, for Reference Tab 2 (EbolaSudan_FDABioProject) - Sudan Ebolavirus ONLY. Highlighted yellow entries are preferred, as they are WGS. If you have different selection criteria or have additional selection criteria, please post those reasons in a comment below. Use this second list to guide your task.

@JingyueWu watch this issue, this adjusts the timeline for the deliverable.

stephenshank commented 1 year ago

Cursory thoughts:

Great catch with the Zaire accession today @steph-sing!

rajamazumder commented 1 year ago

Hi Stephen,

Attached is an alignment of assemblies of the SARS-CoV-2 Wuhan-Hu genome that Millicent (cc'd) and I did using different software available on either HIVE or Galaxy. All assemblies were done using SRR10971381 from SRA using software available either on HIVE or Galaxy. I have included the NCBI reference genome (NC_045512) in the alignment and indicated in the name of each sequence which software was used and the operator's initials.

I also found a paper ( https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5) that provides a good comparison of the performances of several different assemblers for viral genomes.

Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.***> wrote:

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://github.com/rajamazumder Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1618-7 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://github.com/FDA-ARGOS/data.argosdb/issues/113#issuecomment-1299089602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE . You are receiving this because you were mentioned.Message ID: @.***>

rajamazumder commented 1 year ago

Dear Thomas,

Thanks so much for this, this looks great! My first question is whether there are BCOs or workflows (files and/or URLS) associated to the Galaxy based assemblies. The paper looks awesome too!

My next thought is that I'd love to install some of the tools/workflows on our instance and get an integration with ObservableHQ going. I have an alignment viewer: http://alignment.hyphy.org/fasta-viewer

that (so far) has barely made it to our preferred, aforementioned notebook platform: @.***/assembly-qc?payload_id=59db80eee3e3e18f&base_url=galaxy.hyphy.org

Note that the above notebook is pulling live from an existing Galaxy history: https://galaxy.hyphy.org/u/stephenshank/h/assemblyqc---92622

due to a software integration between the two platforms that is being developed by our lab for this project. Here's some eye candy that Professor Mazumder asked for, since it is way flashier than a table: @.***/selection-analysis-on-spike-antibody-complex-for-sars-cov-2?payload_id=17702c03e6f9b734&base_url=galaxy.hyphy.org

I will do a deeper dive, but again this looks fantastic. Thanks again for sharing, I would greatly enjoy further correspondence with you on this topic.

Regards, Stephen


From: Thomas Voigt @.> Sent: Wednesday, November 2, 2022 6:22 PM To: Stephen D. Shank @.>; Mention @.> Cc: FDA-ARGOS/data.argosdb @.>; FDA-ARGOS/data.argosdb @.>; Stephanie Singleton @.>; @. @.>; Quartey, Millicent @.***> Subject: [External] Re: [FDA-ARGOS/data.argosdb] Ebola Sudan Fasta (Issue #113)

Hi Stephen,

Attached is an alignment of assemblies of the SARS-CoV-2 Wuhan-Hu genome that Millicent (cc'd) and I did using different software available on either HIVE or Galaxy. All assemblies were done using SRR10971381 from SRA using software available either on HIVE or Galaxy. I have included the NCBI reference genome (NC_045512) in the alignment and indicated in the name of each sequence which software was used and the operator's initials.

I also found a paper (https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrobiomejournal.biomedcentral.com%2Farticles%2F10.1186%2Fs40168-019-0626-5&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5RWt2d99219zhiUx5nJCNbVFrdQ%2BwZpV9ELJAHEK9bQ%3D&reserved=0) that provides a good comparison of the performances of several different assemblers for viral genomes.

Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.**@.>> wrote: Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.**@.>> wrote:

@rajamazumderhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frajamazumder&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gk1OSU9Io2HOW9fTKLo%2Ba2%2FgsWPuDVfLxVQrvP3E%2Fgo%3D&reserved=0 Any Ebola assemblies to compare/FASTAs to share? I also tried an ivarhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgenomebiology.biomedcentral.com%2Farticles%2F10.1186%2Fs13059-018-1618-7&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xzxrVRIFHavr%2F40PktrOe7UVtZlSbJf8%2FshJ3cAkKSg%3D&reserved=0 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHubhttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFDA-ARGOS%2Fdata.argosdb%2Fissues%2F113%23issuecomment-1299089602&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bag7iPsngULuh1FVLWkeuC56UkjuODQzfmXE%2B6WslXo%3D&reserved=0, or unsubscribehttps://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qTv0gZh0y73wU9AExSKwB7QeeRoVTT0cCg0dorDZFYY%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

rajamazumder commented 1 year ago

Yes, Raja, I set up the ticket.

Thomas,

I've attached a file to guide your assembly process:

Two tabs in this excel file: Tab 1 (ArgosFDABioProject) - all entries in the Argos BioProject, for reference Tab 2 (EbolaSudan_FDABioProject) - Sudan Ebolavirus ONLY. Highlighted yellow entries are preferred, as they are WGS. If you have different or have additional selection criteria, please note those reasons to me via email. Use this second list to guide your task.

Please let me know if you have any questions! Thanks

Stephanie Singleton, MS Senior Bioinformatics Research Associate Microbial Computational Biologist & Scientific Project Lead - FDA-ARGOS Scientific Coordination Team Member Department of Biochemistry and Molecular Medicine, HIVE Lab School of Medicine & Health Sciences The George Washington University 2300 I St NW, Washington, DC 20052 Office 543B Phone: 540-905-3089 Check Us Out! HIVE Lab https://hive.biochemistry.gwu.edu/home My LinkedIn http://www.linkedin.com/in/stephanie-singleton-ms-2788b078

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.***> wrote:

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://github.com/rajamazumder Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1618-7 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://github.com/FDA-ARGOS/data.argosdb/issues/113#issuecomment-1299089602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE . You are receiving this because you were mentioned.Message ID: @.***>

rajamazumder commented 1 year ago

Thanks, Stephanie!

I will get started on this ASAP and should be able to get this back to you by Monday, but will make sure to let you know if anything comes up that delays the process.

Best, Tommy Voigt

On Thu, Nov 3, 2022 at 9:19 AM Singleton, Stephanie < @.***> wrote:

Yes, Raja, I set up the ticket.

Thomas,

I've attached a file to guide your assembly process:

Two tabs in this excel file: Tab 1 (ArgosFDABioProject) - all entries in the Argos BioProject, for reference Tab 2 (EbolaSudan_FDABioProject) - Sudan Ebolavirus ONLY. Highlighted yellow entries are preferred, as they are WGS. If you have different or have additional selection criteria, please note those reasons to me via email. Use this second list to guide your task.

Please let me know if you have any questions! Thanks

Stephanie Singleton, MS Senior Bioinformatics Research Associate Microbial Computational Biologist & Scientific Project Lead - FDA-ARGOS Scientific Coordination Team Member Department of Biochemistry and Molecular Medicine, HIVE Lab School of Medicine & Health Sciences The George Washington University 2300 I St NW, Washington, DC 20052 Office 543B Phone: 540-905-3089 Check Us Out! HIVE Lab https://hive.biochemistry.gwu.edu/home My LinkedIn http://www.linkedin.com/in/stephanie-singleton-ms-2788b078

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.***> wrote:

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://github.com/rajamazumder Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1618-7 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://github.com/FDA-ARGOS/data.argosdb/issues/113#issuecomment-1299089602, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE . You are receiving this because you were mentioned.Message ID: @.***>

rajamazumder commented 1 year ago

Stephen,

Thanks for sharing, those all look great!

Regarding my specific parameters, files, etc, I made my history on Galaxy accessible for this project through the link below: https://usegalaxy.org.au/u/tvoigt/h/sars-cov-2-wuhan-hu1-assembly

I will coordinate with Millicent to get a link to her assemblies. Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Wed, Nov 2, 2022 at 6:45 PM Stephen D. Shank @.***> wrote:

Dear Thomas,

Thanks so much for this, this looks great! My first question is whether there are BCOs or workflows (files and/or URLS) associated to the Galaxy based assemblies. The paper looks awesome too!

My next thought is that I'd love to install some of the tools/workflows on our instance and get an integration with ObservableHQ going. I have an alignment viewer: http://alignment.hyphy.org/fasta-viewer

that (so far) has barely made it to our preferred, aforementioned notebook platform:

@.***/assembly-qc?payload_id=59db80eee3e3e18f&base_url=galaxy.hyphy.org

Note that the above notebook is pulling live from an existing Galaxy history: https://galaxy.hyphy.org/u/stephenshank/h/assemblyqc---92622

due to a software integration between the two platforms that is being developed by our lab for this project. Here's some eye candy that Professor Mazumder asked for, since it is way flashier than a table:

@.***/selection-analysis-on-spike-antibody-complex-for-sars-cov-2?payload_id=17702c03e6f9b734&base_url=galaxy.hyphy.org

I will do a deeper dive, but again this looks fantastic. Thanks again for sharing, I would greatly enjoy further correspondence with you on this topic.

Regards, Stephen

From: Thomas Voigt @.> Sent: Wednesday, November 2, 2022 6:22 PM To: Stephen D. Shank @.>; Mention < @.> Cc: FDA-ARGOS/data.argosdb < @.>; FDA-ARGOS/data.argosdb @.>; Stephanie Singleton @.>; @. @.>; Quartey, Millicent @.**> Subject:* [External] Re: [FDA-ARGOS/data.argosdb] Ebola Sudan Fasta (Issue #113)

Hi Stephen,

Attached is an alignment of assemblies of the SARS-CoV-2 Wuhan-Hu genome that Millicent (cc'd) and I did using different software available on either HIVE or Galaxy. All assemblies were done using SRR10971381 from SRA using software available either on HIVE or Galaxy. I have included the NCBI reference genome (NC_045512) in the alignment and indicated in the name of each sequence which software was used and the operator's initials.

I also found a paper ( https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5 https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrobiomejournal.biomedcentral.com%2Farticles%2F10.1186%2Fs40168-019-0626-5&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5RWt2d99219zhiUx5nJCNbVFrdQ%2BwZpV9ELJAHEK9bQ%3D&reserved=0) that provides a good comparison of the performances of several different assemblers for viral genomes.

Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.***> wrote:

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frajamazumder&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gk1OSU9Io2HOW9fTKLo%2Ba2%2FgsWPuDVfLxVQrvP3E%2Fgo%3D&reserved=0 Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgenomebiology.biomedcentral.com%2Farticles%2F10.1186%2Fs13059-018-1618-7&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xzxrVRIFHavr%2F40PktrOe7UVtZlSbJf8%2FshJ3cAkKSg%3D&reserved=0 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFDA-ARGOS%2Fdata.argosdb%2Fissues%2F113%23issuecomment-1299089602&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bag7iPsngULuh1FVLWkeuC56UkjuODQzfmXE%2B6WslXo%3D&reserved=0, or unsubscribe https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qTv0gZh0y73wU9AExSKwB7QeeRoVTT0cCg0dorDZFYY%3D&reserved=0 . You are receiving this because you were mentioned.Message ID: @.***>

rajamazumder commented 1 year ago

Stephen- Just to make sure we are all on the same page. Thomas is a rotating PhD student who is just helping out with this side project so that we had another pair of eyes looking at the assemblies because others in the Argos team at our end have been busy with other things. Millicent also is helping out with some HIVE assembly tools and she works for the HIVE platform.

-- Raja Mazumder, Ph.D. Professor Department of Biochemistry and Molecular Medicine School of Medicine & Health Sciences The George Washington University Ross Hall, Room 540 2300 Eye Street N.W. Washington, DC 20037 Phone office: 202-994-5004 Phone lab: 202-994-3639 Phone dept: 202-994-5311 Fax: 202-994-8974

On Thu, Nov 3, 2022 at 9:34 AM Thomas Voigt @.***> wrote:

Stephen,

Thanks for sharing, those all look great!

Regarding my specific parameters, files, etc, I made my history on Galaxy accessible for this project through the link below: https://usegalaxy.org.au/u/tvoigt/h/sars-cov-2-wuhan-hu1-assembly

I will coordinate with Millicent to get a link to her assemblies. Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Wed, Nov 2, 2022 at 6:45 PM Stephen D. Shank @.***> wrote:

Dear Thomas,

Thanks so much for this, this looks great! My first question is whether there are BCOs or workflows (files and/or URLS) associated to the Galaxy based assemblies. The paper looks awesome too!

My next thought is that I'd love to install some of the tools/workflows on our instance and get an integration with ObservableHQ going. I have an alignment viewer: http://alignment.hyphy.org/fasta-viewer

that (so far) has barely made it to our preferred, aforementioned notebook platform:

@.***/assembly-qc?payload_id=59db80eee3e3e18f&base_url=galaxy.hyphy.org

Note that the above notebook is pulling live from an existing Galaxy history: https://galaxy.hyphy.org/u/stephenshank/h/assemblyqc---92622

due to a software integration between the two platforms that is being developed by our lab for this project. Here's some eye candy that Professor Mazumder asked for, since it is way flashier than a table:

@.***/selection-analysis-on-spike-antibody-complex-for-sars-cov-2?payload_id=17702c03e6f9b734&base_url=galaxy.hyphy.org

I will do a deeper dive, but again this looks fantastic. Thanks again for sharing, I would greatly enjoy further correspondence with you on this topic.

Regards, Stephen

From: Thomas Voigt @.> Sent: Wednesday, November 2, 2022 6:22 PM To: Stephen D. Shank @.>; Mention < @.> Cc: FDA-ARGOS/data.argosdb < @.>; FDA-ARGOS/data.argosdb @.>; Stephanie Singleton @.>; @. @.>; Quartey, Millicent @.**> Subject:* [External] Re: [FDA-ARGOS/data.argosdb] Ebola Sudan Fasta (Issue #113)

Hi Stephen,

Attached is an alignment of assemblies of the SARS-CoV-2 Wuhan-Hu genome that Millicent (cc'd) and I did using different software available on either HIVE or Galaxy. All assemblies were done using SRR10971381 from SRA using software available either on HIVE or Galaxy. I have included the NCBI reference genome (NC_045512) in the alignment and indicated in the name of each sequence which software was used and the operator's initials.

I also found a paper ( https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5 https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrobiomejournal.biomedcentral.com%2Farticles%2F10.1186%2Fs40168-019-0626-5&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5RWt2d99219zhiUx5nJCNbVFrdQ%2BwZpV9ELJAHEK9bQ%3D&reserved=0) that provides a good comparison of the performances of several different assemblers for viral genomes.

Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.***> wrote:

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frajamazumder&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gk1OSU9Io2HOW9fTKLo%2Ba2%2FgsWPuDVfLxVQrvP3E%2Fgo%3D&reserved=0 Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgenomebiology.biomedcentral.com%2Farticles%2F10.1186%2Fs13059-018-1618-7&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xzxrVRIFHavr%2F40PktrOe7UVtZlSbJf8%2FshJ3cAkKSg%3D&reserved=0 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFDA-ARGOS%2Fdata.argosdb%2Fissues%2F113%23issuecomment-1299089602&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bag7iPsngULuh1FVLWkeuC56UkjuODQzfmXE%2B6WslXo%3D&reserved=0, or unsubscribe https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qTv0gZh0y73wU9AExSKwB7QeeRoVTT0cCg0dorDZFYY%3D&reserved=0 . You are receiving this because you were mentioned.Message ID: @.***>

rajamazumder commented 1 year ago

Stephen,

Millicent's history on Galaxy can be accessed with the link below: https://usegalaxy.org.au/u/milli_q/h/sars-cov-2-assembly---millicent

Let me know if you have any questions.

Best, Tommy Voigt

On Thu, Nov 3, 2022 at 9:34 AM Thomas Voigt @.***> wrote:

Stephen,

Thanks for sharing, those all look great!

Regarding my specific parameters, files, etc, I made my history on Galaxy accessible for this project through the link below: https://usegalaxy.org.au/u/tvoigt/h/sars-cov-2-wuhan-hu1-assembly

I will coordinate with Millicent to get a link to her assemblies. Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Wed, Nov 2, 2022 at 6:45 PM Stephen D. Shank @.***> wrote:

Dear Thomas,

Thanks so much for this, this looks great! My first question is whether there are BCOs or workflows (files and/or URLS) associated to the Galaxy based assemblies. The paper looks awesome too!

My next thought is that I'd love to install some of the tools/workflows on our instance and get an integration with ObservableHQ going. I have an alignment viewer: http://alignment.hyphy.org/fasta-viewer

that (so far) has barely made it to our preferred, aforementioned notebook platform:

@.***/assembly-qc?payload_id=59db80eee3e3e18f&base_url=galaxy.hyphy.org

Note that the above notebook is pulling live from an existing Galaxy history: https://galaxy.hyphy.org/u/stephenshank/h/assemblyqc---92622

due to a software integration between the two platforms that is being developed by our lab for this project. Here's some eye candy that Professor Mazumder asked for, since it is way flashier than a table:

@.***/selection-analysis-on-spike-antibody-complex-for-sars-cov-2?payload_id=17702c03e6f9b734&base_url=galaxy.hyphy.org

I will do a deeper dive, but again this looks fantastic. Thanks again for sharing, I would greatly enjoy further correspondence with you on this topic.

Regards, Stephen

From: Thomas Voigt @.> Sent: Wednesday, November 2, 2022 6:22 PM To: Stephen D. Shank @.>; Mention < @.> Cc: FDA-ARGOS/data.argosdb < @.>; FDA-ARGOS/data.argosdb @.>; Stephanie Singleton @.>; @. @.>; Quartey, Millicent @.**> Subject:* [External] Re: [FDA-ARGOS/data.argosdb] Ebola Sudan Fasta (Issue #113)

Hi Stephen,

Attached is an alignment of assemblies of the SARS-CoV-2 Wuhan-Hu genome that Millicent (cc'd) and I did using different software available on either HIVE or Galaxy. All assemblies were done using SRR10971381 from SRA using software available either on HIVE or Galaxy. I have included the NCBI reference genome (NC_045512) in the alignment and indicated in the name of each sequence which software was used and the operator's initials.

I also found a paper ( https://microbiomejournal.biomedcentral.com/articles/10.1186/s40168-019-0626-5 https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrobiomejournal.biomedcentral.com%2Farticles%2F10.1186%2Fs40168-019-0626-5&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5RWt2d99219zhiUx5nJCNbVFrdQ%2BwZpV9ELJAHEK9bQ%3D&reserved=0) that provides a good comparison of the performances of several different assemblers for viral genomes.

Let me know if you have any questions or concerns.

Best, Tommy Voigt

On Tue, Nov 1, 2022 at 6:59 PM Raja Mazumder @.***> wrote:

Stephanie, Are you a watcher on this? Can you please send Ebola ngs accession to assemble. Also, Thomas can you please send your assembly to Stephen?

On Tue, Nov 1, 2022, 4:19 PM Stephen Shank @.***> wrote:

@rajamazumder https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Frajamazumder&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=gk1OSU9Io2HOW9fTKLo%2Ba2%2FgsWPuDVfLxVQrvP3E%2Fgo%3D&reserved=0 Any Ebola assemblies to compare/FASTAs to share? I also tried an ivar https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgenomebiology.biomedcentral.com%2Farticles%2F10.1186%2Fs13059-018-1618-7&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365687475%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xzxrVRIFHavr%2F40PktrOe7UVtZlSbJf8%2FshJ3cAkKSg%3D&reserved=0 based reference-guided approach that seemed less promising than de novo. It was an eye opening experience for me.

— Reply to this email directly, view it on GitHub https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FFDA-ARGOS%2Fdata.argosdb%2Fissues%2F113%23issuecomment-1299089602&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2Bag7iPsngULuh1FVLWkeuC56UkjuODQzfmXE%2B6WslXo%3D&reserved=0, or unsubscribe https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAE3GISLDUP7YFDRF6K7SPR3WGF3NXANCNFSM6AAAAAARNHLWJE&data=05%7C01%7Cstephen.shank%40temple.edu%7C19f31131f2814ea74b8d08dabd20b00f%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C638030245365843684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=qTv0gZh0y73wU9AExSKwB7QeeRoVTT0cCg0dorDZFYY%3D&reserved=0 . You are receiving this because you were mentioned.Message ID: @.***>

steph-sing commented 1 year ago

@stephenshank all good points in your previous comment. As a reminder, we are trying to complete 1 deliverable: 1 updated BCO per 1 Ebola Sudan SRR pair + 1 fasta file - and any associated metrics from this assembly. I do not want to continue to stray from this, and I want this to be as simple as possible. This task is the exact same for the Wuhan fasta, I will also update on that ticket. Please propose a timeline to completion so I can plan for our data releases. thank you. @JingyueWu please watch this

steph-sing commented 1 year ago

hi @stephenshank please attached the Ebola Sudan fasta here. @JingyueWu will take care of the header for you. If the protocol has been updated, please send and updated link for your BCO here as well, we will push it through the backend. Otherwise, please provide a general status update.

stephenshank commented 1 year ago

https://data.hyphy.org/web/argos/ebola-reassembly/

steph-sing commented 1 year ago

Status: BCO issue is being fixed by @HadleyKing - still need fasta file

stephenshank commented 1 year ago

FASTA file is here, still needs appropriate header: http://data.hyphy.org/web/argos/ebola-reassembly/reassembled.fasta

stephenshank commented 1 year ago

@HadleyKing I'll try converter on the BCO.

steph-sing commented 1 year ago

@stephenshank please try the converter and let us know if you have any issues. We have reassigned the ticket to you.

steph-sing commented 1 year ago

@stephenshank do you have a status update on this task?

stephenshank commented 1 year ago

I have a much better approach that is essentially described here. It uses a tool called VAPOR to a select a reference from several candidates based on kmer distributions between reads and reference genomes. This approach yielded the highest fidely to published ARGOS assemblies yet, only off by a single nucleotide.

I unfortunately encountered some last minute hiccups with the BCO. Will give a FASTA/BCO early next week and then discuss scaling.

stephenshank commented 1 year ago

I now have 8 out of 10 assemblies that look great, ranging from 99-99.99% accuracy, each with a BCO.

Among the two failures, on one the VAPOR tool failed, and required rerunning with a less stringent parameter.

On another, everything ran, but the BAM is of low quality (large stretches where no reads mapped resulting in large stretches of Ns). This is the case for more than one mapper. I suspect that VAPOR may have either picked a poor reference, or no suitable reference was found.

I'll send the 8 decent FASTAs + BCOs ASAP and investigate the above issues for this month. It should not be that much extra work to assemble all Sudan Ebola SRAs.

steph-sing commented 1 year ago

will include in the April Data push on the 5th and April Tasks.

stephenshank commented 1 year ago

FASTA is here: https://data.hyphy.org/web/argos/ebola-reassembly/sudanese_ebola_PL.fasta The BCO ID is ARG_000013.