galaxyproject / training-material

A collection of Galaxy-related training material
https://training.galaxyproject.org
MIT License
306 stars 897 forks source link

Needed: SARS-CoV-2 amplicon variant analysis tutorial needed #2546

Open pvanheus opened 3 years ago

pvanheus commented 3 years ago

The existing [SARS-CoV-2 variant analysis] tutorial is focused on the analysis of metagenomic sequencing data. The majority of data being produced, however, uses the ARTIC amplicon protocol (for Illumina and Nanopore). The RECoVERY, SANBI and the Galaxy COVID-19 project have produced workflows for SARS-CoV-2 amplicon analysis.

A new tutorial is needed walking through at least one of these workflows. It can also mention the use of Nextclade and Pangolin.

I (@pvanheus) propose to develop such a tutorial and refine it during the ASBCB Omics Codeathon: https://datascience.nih.gov/news/participant-applications-asbcb-omics-codeathon in June.

To fill out that form, enter your details, mention the Galaxy SARS-CoV-2 Amplicon tutorial and then select:

  1. Would you like to pitch a project idea for the codeathon: No
  2. Would you be interested in being a team lead for a project No
  3. Here are some approximate project titles, please pick the top three you are interested in working on New projects as they arise

(this last one because our project is not on the official list (yet))

tralynca commented 3 years ago

Hi Peter,

Here are some grammar mistakes and some things that may become potential issues for those working through the tutorial.

1. Introduction

nucleocapid (N) should be nucleocapsid

Sequencing of viral samples can be used to (missing word)

Lots of grammar mistakes in the sentence starting with "Detection of mutations among the circulating SARS-CoV-2 strains...", so I'll just write it out completely for a copy and paste below

Detecting mutations among the circulating SARS-CoV-2 strains is important to infer the lineages within which these strains fall and is therefore important in tracing the emergence of new strains, both at a national and local level. Here are some important definitions:

The definition for epidemiology makes one none the wiser. Can you try:

Epidemiology: the study of the patterns, the spread and the causes of diseases and disorders within a given population and the application of this knowledge to prevent and control health problems

Is the use of pathogen genomic data to determine the distribution and spread of an infectious diseases in a specified population and the application

the virus genome can be sequenced and this data can be used to monitor the emergence of important new variants, and to monitor the trends after an intervention

short read sequence data with the aim of identifying the

2. Under the heading 'Get data'

(the datasets import) without the "zenodo" part

Check that the datatype....?(the sentence is not complete)



3. Under the heading "Examining mapping results using BamQC"

The tool seems to be called QualiMap BamQC and no just BamQC. I struggled to find it in a fussy instance like usegalaxy.eu



Under "Interpreting coverage depth plots"

In the BamQC output, examine the report for *???????, pay special attention to the Mean Coverage (in section Coverage) and the Coverage across reference, Coverage Histogram and Coverage Histogram (0-50X) plots. (does that part about which report should be examined need to be there? The sentence seems to be missing the report name - which was already mentioned in the previous line anyway. Also maybe there should be a fullstop, instead of a comma.)



4. Under 'Variant Calling and Annotation', is it supposed to look like this?:

Open the [ivar variants]{toolshed.g2.bx.psu.edu/repos/iuc/ivar_variants/ivar_variants/1.3.1+galaxy2} tool.

"After setting these parameters Execute the tool." should be under point 5 and not under point 3. I nearly clicked Execute, only to be given more instructions that were required before executing.

Rename the reference sequence with Text transformation with sed (The instructions are not clear that the vcf should be selected when running sed)



Under heading "Examining variants in SARS-CoV-2"

".... changes amino acid 614 in this protein from a Aspartate (Asp or D) (we normally don't say Aspartate, but Aspartic acid - assuming that it's not automatically ionized)

5. Under the heading "Solution"

There are no solutions