BinPro / CONCOCT

Clustering cONtigs with COverage and ComposiTion
Other
125 stars 48 forks source link

Different bic outputs depending on the max_number_processors argument. #33

Closed alneberg closed 11 years ago

alneberg commented 11 years ago

I found a strange bug: If I run:

CONCOCT tests/test_data/coverage tests/test_data/composition.fa -c 3,5,1 -i 100 -b test_out1/ -m 1

I get

3,35245.9938983
4,36842.5739125
5,38241.0545765

while if I run

 CONCOCT tests/test_data/coverage tests/test_data/composition.fa -c 3,5,1 -i 100 -b test_out2/ -m 2

I get

3,35245.9938983
4,36858.7784806
5,38348.0270248

The result is not random and it is the same for both my laptop and on Uppmax server.

chrisquince commented 11 years ago

This may have something to do with random numbers. We only set the seed once at the start of the program. Imagine we are just doing two clusterings on either one or two processors. Firstly with one processor, the effective seed for the second clustering is the state at the end of the first clustering. With two processors though each clustering starts with the same initial seed. If we want to address this we may need to switch to passing the random seed to the gmm function directly.