Issue: In the source code as of 16MAR24, the output of hmmix decode does not match genomic coordinates found in a VCF. Specifically:
All start and end coordinates are 0-indexed—which I assume is due to handling the data using bed files which are 0-indexed—while the VCF files themselves are 1-indexed.
The length column from the hmmix decode output is always 1kb longer than the end - start and there are always 1kb gaps between the end of $tract{i}$ and the start of $tract{i+1}$.
Solution: I have made the necessary modifications to the DecodeModel() function in hmm_functions.py such that:
The start and end coordinates are 1-indexed, which will correspond to the positions in the VCF file.
The coordinates are on the half open interval [start, end) where now end - start = length.
I have also made changes to the Quick tutorial section of the REAME, but the code in the Example with 1000 genomes data section would need to be re-ran, and the README section subsequently updated, after accepting this pull request.
Issue: In the source code as of 16MAR24, the output of
hmmix decode
does not match genomic coordinates found in a VCF. Specifically:length
column from thehmmix decode
output is always 1kb longer than theend - start
and there are always 1kb gaps between the end of $tract{i}$ and the start of $tract{i+1}$.Solution: I have made the necessary modifications to the
DecodeModel()
function inhmm_functions.py
such that:[start, end)
where nowend - start = length
.I have also made changes to the
Quick tutorial
section of the REAME, but the code in theExample with 1000 genomes data
section would need to be re-ran, and the README section subsequently updated, after accepting this pull request.