ComparativeGenomicsToolkit / hal

Hierarchical Alignment Format
Other
164 stars 39 forks source link

hal2maf --length argument does not seem to work properly. Causes incomplete maf output #290

Closed tli71193 closed 10 months ago

tli71193 commented 10 months ago

Hi there,

I've been trying to extract a region of the chromosome in the .hal file with the reference genome being the mouse genome. Code snippet here: hal2maf ${HAL_FILE} output.maf --refGenome Mus_musculus --refSequence chr14 --start 63231154 --length 320 --noDupes --noAncestors

from the code, I am using the 241 cactus v2 hal file and trying to extract at chr14 starting at position 63231154 and getting 320bp from that position.

I am getting this in return after converting the maf to fasta output:

>Mus_musculus.chr14 63231154 6 + 124902244
GGCGCG
>Mus_spretus.CM004108.1 53557762 6 + 117542938
GGCACG
>Mus_caroli.LT608238.1 54385947 6 + 113551381
GGCGCG
>Mus_musculus.chr14 63231160 1 + 124902244
T---
>Mus_spretus.CM004108.1 53557768 1 + 117542938
T---

where im only getting 6bp out of the 320bp. it looks like its starting at the correct place but not extending.

Any solutions would be helpful. Thank you!

glennhickey commented 10 months ago

MAF files are composed of many alignment blocks. Looks like you're just converting the first (of 53) to FASTA.

The MAF format is described here.

Some ideas for producing less fragmented MAF files are described here