Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

Adjust size of bins in q-score Histogram #270

Closed CodingKaiser closed 3 years ago

CodingKaiser commented 3 years ago

This may be buried deep in the documentation somewhere, but how would one go about increase or decreasing the number of bins displayed in the q-score histogram?

I am going through the Python notebook provided to generate the q-score histogram, and the bar chart it spits out has a nice fine-grain view of the distribution of q-scores.

grafik

However, running this on my own sequencing run results in a much coarser view of the distributions, yielding only a total of 3 bins, with one directly overlapping one of the bins, which is not ideal.

sampleRun

I presume setting some variable in the object returned by run_metrics.q_metric_set() should do the trick, but so far I have been unable to find the corresponding setting. Any help would be greatly appreciated!

ezralanglois commented 3 years ago

The number of bins is chosen by RTA. On some very early platforms you do get both full resolution and binned data. On later platforms we moved to binned data (~7 bins). On the most recent platforms, e.g. NovaSeq, you only get 3 bins.

We made this change to 1) save disk space on the instrument and 2) variant callers don't significantly benefit from additional bins.

ezralanglois commented 3 years ago

That last picture with the green bar overlapping the dark green bar is not showing overlapping bins. The skinny green bar indicates where the Q30 threshold is (more useful if you have more than 3 bins).