BioJulia / Bio.jl

[DEPRECATED] Bioinformatics and Computational Biology Infrastructure for Julia
http://biojulia.dev
MIT License
261 stars 63 forks source link

RFC: More user-friendly flags #427

Closed jonathanBieler closed 7 years ago

jonathanBieler commented 7 years ago

Hi, I've started to use Bio.jl and I'm quite impressed so far. I've made some changes to make working with SAMFlags a bit more practical. It would need deprecating the current flags though, so I'm not sure it's worth it.

import Bio.Align: SAMRecord, SAMFlags, isflag, decodeflag, flag

record = SAMRecord("r001\t99\tchr1\t7\t30\t8M2I4M1D3M\t=\t37\t39\tTTAGATAAAGGATACTG\t*")

julia>isflag(record,SAMFlags.READ1)
true

julia>decodeflag(record)

the mate is mapped to the reverse strand
the read is paired in sequencing, no matter whether it is mapped in a pair
the read is mapped in a proper pair
this is read1 (first in pair)

Checking flags is also a bit faster on my computer.

codecov-io commented 7 years ago

Codecov Report

Merging #427 into master will decrease coverage by 0.06%. The diff coverage is 27.27%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #427      +/-   ##
=========================================
- Coverage   76.17%   76.1%   -0.07%     
=========================================
  Files         134     135       +1     
  Lines        9812    9821       +9     
=========================================
  Hits         7474    7474              
- Misses       2338    2347       +9
Impacted Files Coverage Δ
src/align/hts/hts.jl 90.47% <ø> (ø) :arrow_up:
src/align/hts/sam/sam.jl 25% <ø> (ø) :arrow_up:
src/align/hts/bam/record.jl 93.33% <100%> (ø) :arrow_up:
src/align/hts/sam/record.jl 65.5% <100%> (ø) :arrow_up:
src/align/hts/flags.jl 11.11% <11.11%> (ø)
src/align/alignment.jl 86.13% <0%> (-1%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4b9dbb1...f23b29d. Read the comment docs.

bicycle1885 commented 7 years ago

Hi, thank you for your contribution.

The idea is good. However, I cannot accept this change as is. Bit flag is a common technique in programming and most users would know how hot do that. Also, removing exported constants (SAM_FLAG_UNMAP, etc.) is not a good manner.

Instead, having a function (hasflag(record, flag), maybe) that checks the existence of a flag may be useful.

jonathanBieler commented 7 years ago

There's plenty of people that work with sequencing data and that don't have a CS background, there's really no good reason users should do bitwise arithmetic once we provided the proper methods to work with flags.

People are using this tool https://broadinstitute.github.io/picard/explain-flags.html to decode flags, but again there's no reason not to have this included, it's actually quite useful to have.

I guess one method to check flags against each other and record (hasflag), one to decompose a flag and one to explain it would cover all the use cases. I'm not sure you can do that in an elegant manner without having some sort of collection of flags though.

bicycle1885 commented 7 years ago

Why don't you modify the Base.show method of BAMRecord and SAMRecord to explain flags?

jonathanBieler commented 7 years ago

I made a version that doesn't use a baremodule, it's quite ugly, but it works I guess.

Showing the description would be nice but maybe a bit busy, I'm not sure how the formating would work:

Bio.Align.SAMRecord:
    sequence name: r001
             flag: 99
        reference: chr1
         position: 7
  mapping quality: 30
            CIGAR: 8M2I4M1D3M
   next reference: =
    next position: 37
  template length: 39
         sequence: TTAGATAAAGGATACTG
   base qualities: *
  optional fields:
 flag descirption:
the read is paired in sequencing, no matter whether it is mapped in a pair
the read is mapped in a proper pair
the mate is mapped to the reverse strand
this is read1
bicycle1885 commented 7 years ago

I meant something like:

Bio.Align.SAMRecord:
    sequence name: r001
             flag: SAM_FLAG_PAIRED | SAM_FLAG_REVERSE | SAM_FLAG_READ1
        reference: chr1
         position: 7
  mapping quality: 30
            CIGAR: 8M2I4M1D3M
   next reference: =
    next position: 37
  template length: 39
         sequence: TTAGATAAAGGATACTG
   base qualities: *
  optional fields:

SAM_FLAG_ may be redundant though.

jonathanBieler commented 7 years ago

I'm closing this, the last version is just too ugly imo.