Closed cstritt closed 2 years ago
@cstritt @pvanheus do you think this fits ok to any of the existing GTN topics? https://training.galaxyproject.org/ we try and avoid creating new topics for just a single tutorial, when possible. Maybe visualisation? Sequence analysis feels very NGS-y, but we're trying to expand it, maybe there?
@cstritt @pvanheus do you think this fits ok to any of the existing GTN topics? https://training.galaxyproject.org/ we try and avoid creating new topics for just a single tutorial, when possible. Maybe visualisation? Sequence analysis feels very NGS-y, but we're trying to expand it, maybe there?
So I see two issues here:
@hexylena , @pvanheus , many thanks for the helpful comments! I'll start working on them today. Regarding the category for the tutorial, I like the idea of an 'evolution' topic, as suggested by @pvanheus (there already is 'ecology'). The current topics don't really fit, I'd be surprised to find phylogenetics there...
perhaps we need a phylogeny category? You and I have discussed SARS-CoV-2 phylogeny, now this is M. tuberculosis phylogeny - maybe a new category won't be on its own for long? On the other hand, where does the transmission analysis tutorial fit? Is there perhaps a larger category of "relatedness analysis" or "evolution" that is a better fit here?
That can make sense to me. The thing we try and avoid is topics with a single tutorial, but with our discussed covid phylogeny, yeah, that makes more sense. Evolution it is.
Just one more thought here - there really is not much of a workflow for this tutorial because it follows on from previous work. Its not a stand-alone. I understand the desire to note make the "transmission" tutorial too long, but perhaps add a workflow that illustrates the process from VCF to phylogeny at least?
Just one more thought here - there really is not much of a workflow for this tutorial because it follows on from previous work. Its not a stand-alone. I understand the desire to note make the "transmission" tutorial too long, but perhaps add a workflow that illustrates the process from VCF to phylogeny at least?
At present the tutorial is conceived as part of a workshop, where other tutorials and webinars cover sequencing, SNP calling, etc. Thus the students will go from VCF to alignments in the clustering tutorial in the morning, and from there to the phylogeny in the afternoon. Maybe it would make sense to extent the tutorial into a standalone after the workshop?
Just one more thought here - there really is not much of a workflow for this tutorial because it follows on from previous work. Its not a stand-alone. I understand the desire to note make the "transmission" tutorial too long, but perhaps add a workflow that illustrates the process from VCF to phylogeny at least?
At present the tutorial is conceived as part of a workshop, where other tutorials and webinars cover sequencing, SNP calling, etc. Thus the students will go from VCF to alignments in the clustering tutorial in the morning, and from there to the phylogeny in the afternoon. Maybe it would make sense to extent the tutorial into a standalone after the workshop?
Perhaps.
BTW thinking about your workflow again, I realised that you don't address ascertainment bias. Perhaps constant sites can be computed in the previous tutorial (snp_sites has a mode for computing constant sites... its actually aimed at IQ-TREE's -fconst
parameter... I'm not sure if RAxML has a direct equivalent?) and copied over to here? (As an example, here's a workflow that is similar to what is done in your set of tutorials but adds that constant site calculation: https://galaxy.sanbi.ac.za/u/pvanheus/w/snippy-tb-sample-iqtree-015)
I just realized that the ape library is not available in RStudio on Galaxy. Would it be possible to install it?
I just realized that the ape library is not available in RStudio on Galaxy. Would it be possible to install it?
Users can install libraries as needed in Rstudio in Galaxy. That said, if this would e.g. take too much time we can look into changing the base image to include the library.
Users can install libraries as needed in Rstudio in Galaxy. That said, if this would e.g. take too much time we can look into changing the base image to include the library.
install.packages("ape") crashes with:
/bin/sh: 1: x86_64-conda-linux-gnu-cc: not found make: *** [/opt/miniconda/lib/R/etc/Makeconf:170: BIONJ.o] Error 127 ERROR: compilation failed for package ‘ape’
BTW thinking about your workflow again, I realised that you don't address ascertainment bias. Perhaps constant sites can be computed in the previous tutorial (snp_sites has a mode for computing constant sites... its actually aimed at IQ-TREE's
-fconst
parameter... I'm not sure if RAxML has a direct equivalent?) and copied over to here? (As an example, here's a workflow that is similar to what is done in your set of tutorials but adds that constant site calculation: https://galaxy.sanbi.ac.za/u/pvanheus/w/snippy-tb-sample-iqtree-015)
@pvanheus , This was indeed a weighty omission. I now address it in the alignment part, and added a section at the end about rescaling and dating the tree. I use the rescaled branch lengths = (branch lengths * alignment length) / genome size
approach, and ask in the exercise what could be the problem of assuming that sites not present in the SNP alignment are invariant.
On the linting errors:
evolution
category, right. What is involved in making such a thing @hexylena ?@pvanheus, to create a new topic: https://training.galaxyproject.org/training-material/topics/contributing/tutorials/create-new-topic/tutorial.html
(and I am realising I forgot to add instructions for faq folder there, but I can help too)
So the only thing which remains to be done on our side is to create the 'evolution' topic and move both tutorials there, right? As far as I can see this would only involve renaming the existing folder ('phylogenetics') and modify the corresponding metadata.yml. I'm not sure, though, how both tutorials can be moved there, given that they are both in open pull requests
@cstritt yes, @hexylena and I will deal with the renaming and moving this morning. We will merge it as draft tutorials, so that it will be accessible for your course next week, and afterwards we can polish all the last things.
(We have been thinking for a while already to rename metagenomics topic to "microbial analysis" so then it could fit there as well)
Users can install libraries as needed in Rstudio in Galaxy. That said, if this would e.g. take too much time we can look into changing the base image to include the library.
install.packages("ape") crashes with:
/bin/sh: 1: x86_64-conda-linux-gnu-cc: not found make: *** [/opt/miniconda/lib/R/etc/Makeconf:170: BIONJ.o] Error 127 ERROR: compilation failed for package ‘ape’
* removing ‘/opt/miniconda/lib/R/library/ape’
@cstritt You might be able to install in via conda (using the terminal tab in Rstudio) ..I'm testing it now and will add it to the instructions in the tutorial if it works :+1:
ok @cstritt, it appears to work if you install via conda :+1: ..it does give a warning that the package was built with R 4.1.2 while the Rstudio runs 4.1.0. It probably won't be a problem, but maybe good to test
I will merge this now
@cstritt here are the links to your tutorials:
https://training.galaxyproject.org/training-material/topics/evolution/tutorials/mtb_transmission/tutorial.html https://training.galaxyproject.org/training-material/topics/evolution/tutorials/mtb_phylogeny/tutorial.html
(I've also put them on the course program page)
Excellent, thanks a lot for the great support!
This is a second tutorial for the planned Galaxy workshop on WGS of M. tuberculosis (see request #3211). It covers the interpretation and inference of phylogenetic trees.