Open JeffWeinell opened 5 months ago
I don't think HAL has any tools that allow you to modify the sequences. Your best bet is probably to export to MAF then do the masking with your own script. The taffy python API can parse MAF files and may be helpful for this.
Thanks!
I have the alignment also in a MAF file (converted using cactus-hal2maf), but I ran into the same problem (no obvious tool for the job) as when starting with the HAL file. The programs taffy, maf_parse (implemented in PHAST), and MafFilter seemed promising, but as far I can tell they won't do what I need either.
I can't be the only person that has needed to do this. If I come across a solution elsewhere, I'll share it here.
Thanks again, -Jeff
I have an alignment of 58 snake genomes stored as a HAL file and generated using Progressive Cactus. For each genome in the alignment, I have a BED file specifying site positions in the ungapped genome that I want to be hard-masked (with Ns) in an updated alignment.
The example below illustrates what I am trying to do.
Input files that I have:
(1) An alignment (portrayed here as an alignment block with dummy data for simplicity).
(2) BED file (dummy data) with regions of ungapped genome2 that I want to be hard-masked in the updated alignment.
Desired updated alignment
After hard-masking the target genome sites in the BED file, the updated alignment includes unmasked, soft-masked, and hard-masked sites:
I would greatly appreciate any help with how to solve this problem!
-Jeff