Illumina / REViewer

A tool for visualizing alignments of reads in regions containing tandem repeats
GNU General Public License v3.0
73 stars 14 forks source link

Report metrics and phasing information #40

Closed egor-dolzhenko closed 2 years ago

egor-dolzhenko commented 2 years ago

This pull request will:

Example of a metrics file:

VariantId   Genotype    AlleleDepth
DMPK    13/15   28.38/27.84
ATXN3   20/20   20.42/22.72
HTT 14/17   21.00/16.35
HTT_CCG 12/9    17.89/20.74

Example of a file with phasing information:

LocusId Diplotype   Score
DMPK    (LF)(CAG){13}(RF)/(LF)(CAG){15}(RF) 700671
ATXN3   (LF)(GCT){20}(RF)/(LF)(GCT){20}(RF) 716594
HTT (LF)(CAG){14}(CAACAG)(CCG){12}(RF)/(LF)(CAG){17}(CAACAG)(CCG){9}(RF)    646181
HTT (LF)(CAG){14}(CAACAG)(CCG){9}(RF)/(LF)(CAG){17}(CAACAG)(CCG){12}(RF)    645285

Here (LF) and (RF) correspond to the left and the right flanks respectively; (CAG){14} corresponds to 14 copies of CAG; Score is the cumulative alignment score of all reads to the corresponding diplotype.

Notes

egor-dolzhenko commented 2 years ago

Minor code comment or 2. I have more thoughts at the design level:

  1. Can the (LF) and (RF) elements in the phasing information be implicit? Are there cases where these wouldn't be part of the haplotype?
  2. The concept of turning the output off by setting a bit on the ostream has a bad smell to it. A simple and transparent alternative is to have this controlled by bool -- what is the advantage of the ostream approach?

Thank you for the thorough comments!

ctsa commented 2 years ago

Making it non-optional is even better. Maybe take this approach for now and add the option to turn it off once there's a good use case for this.

egor-dolzhenko commented 2 years ago

Making it non-optional is even better. Maybe take this approach for now and add the option to turn it off once there's a good use case for this.

Sounds good! I'll make this change and merge?

ctsa commented 2 years ago

Sure, or merge now, these are all just items to consider.