Closed jonathanBieler closed 6 months ago
Just to make sure this doesn't get lost in the slackhole, @jakobnissen said:
You need to encode the sequences differently. BAM uses the same sequence encoding as
LongDNA{4}
from BioSequences.jlBut yes, I've been wanting to do a rehaul of XAM.jl for some time now. It's a little tricky to use correctly.
A few more points:
- l_read_name has +1, because the read name needs to be NULL-terminated in the BAM record
- Your CIGAR function does not work:
1M1X1M
is different from2M1X
.- block_size also includes the length of the fields in the BAM record itself, not only the vector
I'm trying to convert SAM record to a BAM one. I think I got the data layout roughly correct but the cigar and sequences need to be encoded differently or something. For context I now have paired read alignement working in BurrowsWheelerAligner.jl, so it would be possible to do a FASTQ to BAM & results directly in Julia. In general it would be nice to have more support for creating BAM alignment from scratch, modify them, etc. (https://discourse.julialang.org/t/change-record-sequence-in-a-bam-file-using-xam/98141/2) and I think converting SAM to BAM is a good starting point.
Any help would be appreciated.