Illumina / interop

C++ Library to parse Illumina InterOp files
http://illumina.github.io/interop/index.html
GNU General Public License v3.0
75 stars 26 forks source link

Tiny difference between number of reads in `Stats.json/Fastq` and `InterOp/index_summary` #268

Closed sklages closed 8 months ago

sklages commented 3 years ago

Comparing "number of reads"

InterOp

ar = iop.index_summary(run_metrics, level='Barcode')

results in:

sample1/TTAGGC = 38423040.0 (ClusterCount)
sample2/GCCAAT = 71338704.0 (ClusterCount)

Stats.json (bcl2fastq)

sample1/TTAGGC = 38423042 (NumberReads)
sample2/GCCAAT = 71338705 (NumberReads)

Fastq files (bcl2fastq)

sample1/TTAGGC = 38,423,042
sample2/GCCAAT = 71,338,705

Differences are not that big, but I'd expect identical values. Or vice versa, for any reporting function I would parse Stats.json to be "in sync" with the results fastq files.

I haven't tested this systematically, so I can't tell how big the differences are in other datasets/runs.

Any idea where this little inaccuracy comes from and how to fix it?

nudpa commented 3 years ago

Hello @sklages , our best guess is that these are errors caused by rounding higher precision numbers down to floating point in InterOp. For instance, if that comes from the index_summary app, it probably arises from this print statement: https://github.com/Illumina/interop/blob/master/src/apps/index_summary.cpp#L250

We'll try to fix this there and anywhere else we may have done it.

sklages commented 3 years ago

Hi @nudpa - okay, makes sense. But just to be clear, I did not use the app, but the python bindings to InterOp, in this case index_summary (http://illumina.github.io/interop/namespacecore.html#aa2d77c36f24e55d0d9f31de38e9cb55e)

ezralanglois commented 8 months ago

I suspect this is a bug in bcl2fastq