cjlin1 / libsvm

LIBSVM -- A Library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
BSD 3-Clause "New" or "Revised" License
4.54k stars 1.64k forks source link

Fixes random number generator on windows. Fixes #103 #140

Open smarie opened 5 years ago

smarie commented 5 years ago

Fixes #103 (This is the same fix that was proposed for liblinear.)

On windows platforms, liblinear and libsvm have strong convergence issues because of the way random numbers are generated: max random number in Windows is 15 bits (even on 64 bit windows), which is 32767, while max random number in linux+GCC is 31 bits (resp. 63 bits in 64 bits systems I guess) so that's 2147483647 (resp 9223372036854775807).

If I understand correctly, these random numbers are used in the coordinate gradient descent algorithms, to find the next coordinate to act upon. When the dimensionality (e.g. number of samples) is large, the random number generator on windows has a hard time to explore all dimensions.

This is a known bug documented in liblinear FAQ (strangely enough, not the libsvm FAQ) but the proposed workaround was wrong.

I made a patch for this years ago in liblinear, that was approved by several users yet never merged: https://github.com/cjlin1/liblinear/pull/28 .

Since another user reported it on libsvm as #103, here is the corresponding PR. Note that I am proposing this simultaneously to the scikit-learn project (python), as they observed some convergence issues. Some of them might be due to this platform-related bug ?