Open holmeso opened 2 years ago
I should add that the latest released version (78 I think) of the code was used.
where is the output file, could I have a look. I doubt they are just different in output order.
output is here: /mnt/lustre/working/genomeinfo/share/qprofiler2/threading_bug
I checked these outputs, the difference is appeared on "tags:SA:Z". The value (annotation information) seems different in almost every BAM record, so here only the top 100 tally elements are reported, they are random or FIFO. Even you run single thread mode twice, the output on this section is often slightly different.
<sequenceMetrics name="tags:XS:i" readCount="14443165">
<variableGroup name="XS" tallyCount="100+">
<!-- here only list top 100 tally element -->
<tally count="39804" value="101"/>
...
<tally count="7072316" value="others"/>
</variableGroup>
</sequenceMetrics>
In this example, the listed 100 tallies, may be slightly different, but total counts are the same. The last row
<tally count="7072316" value="others"/>
is same to another XML file.
Even you run single thread mode twice, the output on this section is often slightly different.
Really?
I don't think we should be capturing information that can't be reliably reproduced when running the same code against the same bam many times.
When running qprofiler2 with consumer thread count set to 2 and producer thread count set to 1, different results are obtained when compared with running in single threaded mode.
To Reproduce Steps to reproduce the behavior:
Expected behavior I would expect qprofiler2 to produce the same results regardless of the threading options used.