cjlin1 / libsvm

LIBSVM -- A Library for Support Vector Machines
https://www.csie.ntu.edu.tw/~cjlin/libsvm/
BSD 3-Clause "New" or "Revised" License
4.51k stars 1.64k forks source link

Loss of precision of parameters in a model file #119

Open hisakatha opened 6 years ago

hisakatha commented 6 years ago

Double precision floating-point number parameters in svm-train, such as gamma, are written to a model file in a format of %g by svm_save_model in svm.cpp. This causes loss of precision of parameters, hence a slight difference between the kernel function in svm-train and that of svm-predict. The effect on precision of prediction may be small, especially when Qfloat is defined as float, but I think the parameters should be written in a format like %.16g.

tavianator commented 6 years ago

The number of digits required to round-trip an IEEE double is 17, not 16.

hisakatha commented 6 years ago

@tavianator Thank you for your helpful comment. Actually, I've found %.16g in the output format of SV coefficients in svm_save_model. If my proposal is effective, I think the format of SV coefficients also should be replaced with %.17g.

cjlin1 commented 6 years ago

%.16 should be enough due to the normalized representation (i.e., 17 digits in total) hisakatha writes:

@tavianator Thank you for your helpful comment. Actually, I've found %.16g in the output format of SV coefficients in svm_save_model. If my proposal is effective, I think the format of SV coefficients also should be replaced with %.17g.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.*

cjlin1 commented 6 years ago

We agree that parameters should be writtin in a higher precision (as they don't cost too much memory or file space) We will try to change that in the next release hisakatha writes:

Double precision floating-point number parameters in svm-train, such as gamma, are written to a model file in a format of "%g" by svm_save_model in svm.cpp. This causes loss of precision of parameters, hence a slight difference between the kernel function in svm-train and that of svm-predict. The effect on precision of prediction may be small, especially when Qfloat is defined as float, but I think the parameters should be written in a format like "%.16g".

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.*

tavianator commented 6 years ago

%.16 should be enough due to the normalized representation (i.e., 17 digits in total)

%.16g is only 16 digits total. Here's an example (borrowed from here) that shows it's not enough:

#include <stdio.h>

int main() {
    double a = 18014398509481982.0;
    double b = 18014398509481980.0;
    printf("%.16g\n%.16g\n%s\n", a, b, a == b ? "true" : "false");
    return 0;
}

Output:

1.801439850948198e+16
1.801439850948198e+16
false

From the famous "What Every Computer Scientist Should Know About Floating-Point Arithmetic" paper, Theorem 15:

... The same argument applied to double precision shows that 17 decimal digits are required to recover a double precision number.

cjlin1 commented 6 years ago

I think you are right.. We will change the code later Tavian Barnes writes:

%.16 should be enough due to the normalized representation
(i.e., 17 digits in total)

%.16g is only 16 digits total. Here's an example that shows it's not enough:

include

int main() { double a = 18014398509481982.0; double b = 18014398509481980.0; printf("%.16g\n%.16g\n%s\n", a, b, a == b ? "true" : "false"); return 0; }

Output:

1.801439850948198e+16 1.801439850948198e+16 false

From the famous "What Every Computer Scientist Should Know About Floating-Point Arithmetic" paper, Theorem 15:

... The same argument applied to double precision shows that
17 decimal digits are required to recover a double precision
number.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.*

cjlin1 commented 5 years ago

FYI, in the libsvm 3.23 released 2 days ago this has been corrected.

Tavian Barnes writes:

%.16 should be enough due to the normalized representation
(i.e., 17 digits in total)

%.16g is only 16 digits total. Here's an example that shows it's not enough:

include

int main() { double a = 18014398509481982.0; double b = 18014398509481980.0; printf("%.16g\n%.16g\n%s\n", a, b, a == b ? "true" : "false"); return 0; }

Output:

1.801439850948198e+16 1.801439850948198e+16 false

From the famous "What Every Computer Scientist Should Know About Floating-Point Arithmetic" paper, Theorem 15:

... The same argument applied to double precision shows
that 17 decimal digits are required to recover a double
precision number.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.*