BDI-pathogens / phyloscanner

Phylogenetics between and within hosts at once, all along the genome.
GNU General Public License v3.0
47 stars 14 forks source link

Bootstrapping trees 1st part #60

Open evandiego83 opened 3 years ago

evandiego83 commented 3 years ago

Hello developers.

I am trying to use this software am interested in how the bootstrapping procedure works for making the trees? When you specify the number of bootstraps it seems the raxml doesn't produce 100 bootstrap trees of the same alignment but 100 trees from 100 different bootstrap alignments. So when you specify this option it is actually bootstrapping the alignment. so creating 100 different alignments on the same data? Is this correct and if so how does it do this? Any why is this option preffered? Also I find in my use that raxml can be quite slow and iqtree may be faster. Is there any easier option to use the same appraoch but for iqtree.

My apologies for all the questions. Thanks you. Evan

ChrisHIV commented 3 years ago

Hi Evan, the only method of bootsrapping with which I am familiar is to bootstrap the alignment, and then calculate a tree from each alignment in the usual way. I hope to add an iqtree option at some point, but it will likely be many months in the future. The easy way to use alternatives to raxml and/or the bootstrapping method provided is to run phyloscanner_make_trees.py with the --no-trees option, and then manually run whatever tree-inference steps you like on the alignment files produced. Note that if you also specify for some coordinates to be excised, windows containing at least one such coordinate will have two alignment files: with and without said coordinates. By definition it's the one with the coordinates excluded that you want to use for your tree.

evandiego83 commented 3 years ago

HI Chris,

Thanks to you for your reply and suggestions. When I use that option I only get one AlignedReadsInWindow.fasta file as opposed to AlignedReadsInWindow.fasta.BS0 to AlignedReadsInWindow.fasta.BS100 file. It is these files that I weren't sure off? What exactly do these files correspond to and how are they formatted?

ChrisHIV commented 3 years ago

"When I use that option" - I assume you mean --no-trees? This option means RAxML is not run, i.e. phyloscanner_make_trees.py only produces an alignment file in each window. You then manually run iqtree or whatever you like on those alignment files. "It is these files that I weren't sure off?" - I don't understand your question. To clarify, the alignment files .BSX, where X is an integer, are produced by RAxML when you ask it for bootstraps: it creates many different variations of the original alignment file, obtained by sampling the alignment columns without replacement. Each one of these is referred to as a bootstrap. These boostrapped alignment files are still in fasta format, I think (I can't test now), otherwise raxml has converted them to a different sequence alignment format. If this is unclear, it might help to read about how RAxML does bootstrapping.