ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
154 stars 40 forks source link

halPhyloPTrain question #157

Open RenzoTale88 opened 3 years ago

RenzoTale88 commented 3 years ago

Hello, sorry this might be a bit of a dumb question. I've got a series of hal alignments generated through cactus that I'd like to screen for genomic regions under selective pressure. I've been trying to follow the explanations on the main github page, but I am missing the passage to generate the neutralRegions.bed file:

halPhyloPTrain.py mammals.hal human neutralRegions.bed neutralModel.mod --numProc 12 

Maybe I'm just being a bit dumb, is there a way to generate this bed from the alignments themselves? Thanks in advance Andrea

glennhickey commented 3 years ago

This is a legitimate issue: the documentation is incomplete. One way to get neutral sites is from 4-fold-degenerate codon (4D) positions. These can be obtained from an exon annotation of the genome of interest using hal4dExtract

RenzoTale88 commented 3 years ago

Hi @glennhickey thanks for the reply! Unfortunately, the genomes I got have no annotations yet. Is there an "annotation agnostic" methods using cactus alignments? If not, we can just wait for the annotation. Thanks Andrea

glennhickey commented 3 years ago

That's an interesting question. I can't think of a way off the top of my head, but you may want to contact the PhyloP folks directly to see if they have any idea.s

On Mon, Aug 3, 2020 at 8:42 AM RenzoTale88 notifications@github.com wrote:

Hi @glennhickey https://github.com/glennhickey thanks for the reply! Unfortunately, the genomes I got have no annotations yet. Is there an "annotation agnostic" methods using cactus alignments? If not, we can just wait for the annotation. Thanks Andrea

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ComparativeGenomicsToolkit/hal/issues/157#issuecomment-668001355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373QFNULGRCCEILYOFA3R62WE3ANCNFSM4PP3DEBA .

RenzoTale88 commented 3 years ago

@glennhickey thank you very much, I'll write them and see if there is any way. In case of answer, I'll post it here too, perhaps it will be useful to other :)

RenzoTale88 commented 3 years ago

Hi @glennhickey after rising an issue in RPHAST page, I got the following solution:

  1. Extract the alignments of interest using hal2maf: hal2maf --hdf5InMemory --noAncestors --refGenome genome1 MAFS/myalignments.hal MAFS/genome1.maf
  2. Extract the tree of interest using halStats --tree myalignments.txt
  3. Run phyloFit with the specific tree: phyloFit --tree "mytree" MAFS/genome1.maf
  4. Run halPhyloPMP.py as follow: halPhyloPMP.py --hdf5InMemory --numProc 4 alignments.hal genome1 phyloFit.mod genome1.wig

I got one further question though: what is the difference between halPhyloPMP.py and halTreePhyloP.py?

Thank you in advance Andrea