Open kain88-de opened 7 years ago
Thanks for this. Since noise is added to remove degeneracies from the similarity matrix it shouldn't be a problem if it's slightly correlated - a potentially problematic case would be when the same noise is added to elements of the similarity matrix which are identical, which I think is unlikely. However since the change you're proposing is not very time-consuming and still adds to the robustness of the code, I'm going to implement it. The only problem is that I'm about to leave for holidays and I'll be back on the 26th of July and I'll unlikely be able to work on it before then
a temporary solution would be the ability to pass a different random seed to each independent process to ensure that they are uncorrelated, as suggested in #40. The quality of RNG is not very critical for this application
In ap.c the normal
rand
function of the c standard lib is called. In general this is a weak Linear congruential generator that is not thread save. With weak here I mean that it has a short period and the resulting numbers are correlated. Another issue is thread safety. This is a problem when theCAffinityPropagation
function is called from separate python processes because we can't independently seed the RNG for each call of the function. I'm not sure how often the addition of noise is used in practice but I would recommend to remove the addition of noise inap.c
and rather do that inaffinityprop.pyx
with it's ownnumpy.random.RandomState
. To allow using unique random states I would suggest the following code from scikit-learn.The noise addtion in
AffinityPropagation.run
should then beThis will allow to pass a separate seed for each python process ensuring that the random numbers are not correlated. Additionally the numpy uses the Mersenne Twister, a widely accepted RNG for scientific applications.
@mtiberti if it doesn't matter that the added noise might be correlated it would be nice if you can still add a comment in the C code.