[WIP] convert SAM to BAM

I'm trying to convert SAM record to a BAM one. I think I got the data layout roughly correct but the cigar and sequences need to be encoded differently or something. For context I now have paired read alignement working in BurrowsWheelerAligner.jl, so it would be possible to do a FASTQ to BAM & results directly in Julia. In general it would be nice to have more support for creating BAM alignment from scratch, modify them, etc. (https://discourse.julialang.org/t/change-record-sequence-in-a-bam-file-using-xam/98141/2) and I think converting SAM to BAM is a good starting point.

Any help would be appreciated.

sam = "seq1\t81\tPhiX\t1051\t60\t70M\t=\t1821\t702\tTCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC\t*\tNM:i:0\tMD:Z:70\tMC:Z:70M\tAS:i:70\tXS:i:0"

r = XAM.SAM.Record(sam)

cigar_ops = ('M','I','D','N','S','H','P','=','X')

br = BAM.Record()

# fixed-length fields (see BMA specs for the details)
br.refid = 1 #TODO : get from header
br.pos = SAM.position(r)
br.l_read_name = SAM.tempname(r) |> x -> length(x)+1 #why do I need + 1 ?
br.mapq = SAM.mappingquality(r)
#br.bin
br.n_cigar_op = sum(x ∈ cigar_ops for x in SAM.cigar(r))
br.flag = SAM.flag(r)
br.l_seq = SAM.seqlength(r)
br.next_refid = 1 #TODO : get from header
br.next_pos = SAM.nextposition(r)
br.tlen = SAM.templength(r)

# variable length data

t = x -> unsafe_wrap(Vector{UInt8}, x)

br.data = vcat(
    SAM.tempname(r) |> t,
    SAM.cigar(r) |> t,
    SAM.sequence(String, r) |> t,
    fill(0x41, length(SAM.sequence(r))),
    #SAM.quality(r),
    reduce(vcat, r.data[f] for f in r.fields) # aux data
)
br.block_size = length(br.data)

br

XAM.BAM.Record:
      template name: seq1
               flag: 81
       reference ID: 2
           position: 1052
    mapping quality: 60
              CIGAR: 70599891M
  next reference ID: 2
      next position: 1822
    template length: 702
           sequence: RGRGGVGVGMRGRGGMGMRGRGGVGMRGGVGVRGGMGAGVGARGRGGVGVRGGMGVRGGMRGRGGARGRG
       base quality: UInt8[0x41, 0x43, 0x43, 0x41, 0x54, 0x54, 0x54, 0x43, 0x41, 0x41  …  0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41, 0x41]
     auxiliary data: AA=A AA=A AA=A AA=A AA=A AA=A AA=A AA=AERROR: invalid type tag: 'M'
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] auxtype
    @ ~/.julia/packages/XAM/l6LT2/src/bam/auxdata.jl:63 [inlined]
  [3] loadauxtype
    @ ~/.julia/packages/XAM/l6LT2/src/bam/auxdata.jl:82 [inlined]

BioJulia / XAM.jl

[WIP] convert SAM to BAM #61