MikkelSchubert / paleomix

Pipelines and tools for the processing of ancient and modern HTS data.
https://paleomix.readthedocs.io/en/stable/
MIT License
43 stars 19 forks source link

documentation for paloemix phylo pipeline #17

Open siriusb-nox opened 6 years ago

siriusb-nox commented 6 years ago

Dear Mikkel,

After successfully using your fantastic paloemix bam_pipeline to generate bam files of my genomes, I am trying to used the phylo pipeline. However, I am struggling to make it run because I am unsure on how to set the makefile.yaml script properly (e.g. I want a genome-wie analysis, and not in specific regions, so I don't know what to write on prefix). Do you have perhaps extended documentation on the phylo package (it says is under construction on readthedocs file), or perhaps a makefile.yaml script of reference for the phylo pipeline?

Many thanks in advance!

Oscar

MikkelSchubert commented 4 years ago

Dear Oscar,

I apologize for the state of the documentation for the phylogenetic pipeline. I do intend to remidy this, along with a rework it to make it easier to use overall.

To answer your question, there is currently no way to run the phylogenetic pipeline without a set of targets regions. So if you want to analyse the whole genome, then you simply need to creat bed file that covers the whole genome. A simple way to do this, is to generate a BED file from the FASTA index file (.fai), like so:

$ awk '{print $1, 0, $2}' OFS='\t' rCRS.fasta.fai > whole_genome.bed

You either need to place this file in data/regions/whole_genome.bed in the folder where you want to run the pipeline or specify a different folder with --regions-root.

The Prefix is the name of your genome/FASTA file, but without the .fasta. In the above example, that would be rCRS. The FASTA file should be placed in ./data/prefixes/rCRS.fasta. This folder can be changed with --prefix-root.

If we ignore the other options, then the makefile would look like this:

  RegionsOfInterest:
     whole_genome:
       Prefix: rCRS

This is basically the same as the example project included with the pipeline. Looking at the example project might give you a better idea of how the pipeline should be setup.

Best regards, Mikkel