documentation for paloemix phylo pipeline

Dear Oscar,

I apologize for the state of the documentation for the phylogenetic pipeline. I do intend to remidy this, along with a rework it to make it easier to use overall.

To answer your question, there is currently no way to run the phylogenetic pipeline without a set of targets regions. So if you want to analyse the whole genome, then you simply need to creat bed file that covers the whole genome. A simple way to do this, is to generate a BED file from the FASTA index file (.fai), like so:

$ awk '{print $1, 0, $2}' OFS='\t' rCRS.fasta.fai > whole_genome.bed

You either need to place this file in data/regions/whole_genome.bed in the folder where you want to run the pipeline or specify a different folder with --regions-root.

The Prefix is the name of your genome/FASTA file, but without the .fasta. In the above example, that would be rCRS. The FASTA file should be placed in ./data/prefixes/rCRS.fasta. This folder can be changed with --prefix-root.

If we ignore the other options, then the makefile would look like this:

  RegionsOfInterest:
     whole_genome:
       Prefix: rCRS

This is basically the same as the example project included with the pipeline. Looking at the example project might give you a better idea of how the pipeline should be setup.

Best regards, Mikkel

MikkelSchubert / paleomix

documentation for paloemix phylo pipeline #17