AdamaJava / adamajava

Other
14 stars 5 forks source link

qp2 produces different output with different threading options #289

Open holmeso opened 2 years ago

holmeso commented 2 years ago

When running qprofiler2 with consumer thread count set to 2 and producer thread count set to 1, different results are obtained when compared with running in single threaded mode.

To Reproduce Steps to reproduce the behavior:

  1. Run qprofiler2 against a bam with --threads-consumer 2 --threads-producer 1
  2. Run qprofiler2 against the same bam with --threads-consumer 1 --threads-producer 1
  3. Diff the output
>             <tally count="4826" value="20"/>
38098d38098
<             <tally count="7202" value="34"/>
38099a38100
>             <tally count="7403" value="37"/>
38102d38102
<             <tally count="9680" value="44"/>
38110a38111
>             <tally count="11621" value="55"/>
38113d38113
<             <tally count="12596" value="63"/>
38114a38115,38116
>             <tally count="14944" value="66"/>
>             <tally count="14361" value="67"/>
38118a38121
>             <tally count="13764" value="73"/>
38124d38126
<             <tally count="14886" value="79"/>
38134d38135
<             <tally count="16490" value="90"/>
38140d38140
<             <tally count="17215" value="97"/>
38143c38143
<             <tally count="309919" value="others"/>
---
>             <tally count="321069" value="others"/>
38248a38249
>             <tally count="1" value="chr12,100162604,+,102S32M16S,0,0;"/>
38255d38255
<             <tally count="1" value="chr15,51004848,+,89S31M31S,21,0;"/>
38263a38264
>             <tally count="1" value="chr2,33141309,+,107S44M,55,1;"/>
38266a38268
>             <tally count="1" value="chr2,33141311,-,24S50M77S,0,3;"/>
38269a38272
>             <tally count="1" value="chr2,33141315,+,92S59M,0,3;"/>
38275d38277
<             <tally count="1" value="chr2,33141346,-,64M87S,0,2;"/>
38289d38290
<             <tally count="1" value="chr2,33141439,-,49M102S,0,3;"/>
38294d38294
<             <tally count="1" value="chr2,33141462,+,115S36M,0,1;"/>
38305d38304
<             <tally count="1" value="chr2,33141515,+,87S29M1I34M,0,6;"/>
38315d38313
<             <tally count="1" value="chr2,33141597,-,44M107S,0,1;"/>
38323c38321
<             <tally count="1" value="chr3,185349234,+,77S10M3D7M1I45M11S,2,6;"/>
---
>             <tally count="1" value="chr2,901083,+,75M76S,60,0;"/>
38326a38325
>             <tally count="1" value="chr4,42198875,-,39M112S,0,2;"/>
38328a38328
>             <tally count="1" value="chr5,42971954,+,17S38M96S,0,0;"/>
38344d38343
<             <tally count="1" value="GL000232.1,-23434,151M,5;chrY,-9980398,151M,6;GL000234.1,-10396,151M,7;chr1,-5715547,112M1I38M,9;"/>
38349d38347
<             <tally count="1" value="GL000234.1,+10408,10S141M,8;"/>
38370d38367
<             <tally count="1" value="GL000234.1,-10396,150M,11;"/>
38383a38381,38387
>             <tally count="1" value="chr13,+115099244,112M1I38M,5;"/>
>             <tally count="1" value="chr13,+115099245,111M1I39M,5;"/>
>             <tally count="1" value="chr13,-115099252,104M1I46M,4;"/>
>             <tally count="1" value="chr13,-115099255,101M1I49M,4;"/>
>             <tally count="1" value="chr13,-115099262,94M1I50M6S,4;"/>
>             <tally count="1" value="chr5,-30949,3M1I147M,6;"/>
>             <tally count="1" value="chr5,-30951,151M,7;"/>
38422,38423d38425
<             <tally count="1" value="chrY,+9980386,151M,6;"/>
<             <tally count="1" value="chrY,+9980400,151M,9;"/>
38439,38440d38440
<             <tally count="1" value="chrY,-9980391,148M3S,8;"/>
<             <tally count="1" value="chrY,-9980401,151M,8;"/>

Expected behavior I would expect qprofiler2 to produce the same results regardless of the threading options used.

holmeso commented 2 years ago

I should add that the latest released version (78 I think) of the code was used.

ChristinaXu2017 commented 2 years ago

where is the output file, could I have a look. I doubt they are just different in output order.

holmeso commented 2 years ago

output is here: /mnt/lustre/working/genomeinfo/share/qprofiler2/threading_bug

ChristinaXu2017 commented 2 years ago

I checked these outputs, the difference is appeared on "tags:SA:Z". The value (annotation information) seems different in almost every BAM record, so here only the top 100 tally elements are reported, they are random or FIFO. Even you run single thread mode twice, the output on this section is often slightly different.

ChristinaXu2017 commented 2 years ago
<sequenceMetrics name="tags:XS:i" readCount="14443165">
<variableGroup name="XS" tallyCount="100+">
<!-- here only list top 100 tally element -->
<tally count="39804" value="101"/>
...
<tally count="7072316" value="others"/>
</variableGroup>
</sequenceMetrics>

In this example, the listed 100 tallies, may be slightly different, but total counts are the same. The last row

<tally count="7072316" value="others"/>

is same to another XML file.

holmeso commented 2 years ago

Even you run single thread mode twice, the output on this section is often slightly different.

Really?

holmeso commented 2 years ago

I don't think we should be capturing information that can't be reliably reproduced when running the same code against the same bam many times.