ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
164 stars 39 forks source link

halPhyloP no output or error message? #296

Open astarr97 opened 9 months ago

astarr97 commented 9 months ago

Hello,

I've been playing around with using halPhyloP to compute PhyloP scores on the 447-way placental mammal alignment. Rather than computing PhyloP scores for entire genomes, we would like to compute PhyloP scores for an arbitrary subset of sites in any genome of interest. It seems like halPhyloP using the --refBed argument should be able to do this and in some cases it can. However, in other cases it outputs a blank wig file with no error message or other explanation.

For example, using this command: ./cactus-bin-v2.6.13/bin/halPhyloP --refBed test_works.bed hg38.447way.hal Orcinus_orca fullTreeAnc239.100kb.mod test_worked.wig successfully outputs the PhyloP scores for the 5 sites into test_worked.wig (see attached, I added ".txt" so I can upload it to github).
test_works.bed.txt test_worked.wig.txt

However, when I do the same thing on a different set of sites with: ./cactus-bin-v2.6.13/bin/halPhyloP --refBed test_fails.bed hg38.447way.hal Orcinus_orca fullTreeAnc239.100kb.mod test_failed.wig there is no output. It results in a blank .wig file and does not print any error message. test_fails.bed.txt test_failed.wig.txt

Any ideas on what might be going on here or how to resolve this? I have other examples showing it is not specific to this species or contig. It doesn't seem like the input bed files are different in any discernible way either. Any help would be much appreciated!

Edit: I should also add that no matter how large the "test_fails.bed" file is, everything finishes running in a few seconds.

glennhickey commented 9 months ago

That is strange. halPhyloP hasn't been maintained in a while, unfortunately. I recommend exporting to MAF and running regular phyloP directly on that.

Note that you can extract subregions of the MAF using taffy which is included in cactus.

astarr97 commented 9 months ago

Thanks for the quick reply! As halPhyloP is pretty fast, it isn't too bad to compute all the PhyloP scores for a genome so I will likely stick to that as it seems like halPhyloP works well for computing it for the whole alignment.

Sorry to hear about halPhyloP not being maintained. I've been computing PhyloP scores on a lot of different versions (i.e. masking different subsets of species) of the 447-way alignment using MAFs and the rate limiting step has actually been that I can only store 100 terabytes of data (each masked MAF file is massive) so it would definitely be very helpful for me, but my use case is probably pretty rare. Thanks again!