OpenMS / OpenMS

The codebase of the OpenMS project
https://www.openms.de
Other
478 stars 318 forks source link

Debian Eigen does not seem to work with OpenMS #861

Closed hroest closed 10 years ago

hroest commented 10 years ago

When trying out the debian system-library Eigen3 (libeigen3-dev 3.2.1-2) I get the following test failure (I assume that this is related to eigen since the fitter uses eigen): 203 - GammaDistributionFitter_test (Failed)

the difference is quite substantial

203:  -  line 119:  TEST_REAL_SIMILAR(result.b,7.25): got -25.3161638403746, expected 7.25 (absolute: 32.5661638403746 [0.01], relative: -3.49188466763788 [1.00001], message: "numbers have different signs and difference is not small"
203:  -  line 121:  TEST_REAL_SIMILAR(result.p,3.11): got -48.1519900789467, expected 3.11 (absolute: 51.2619900789467 [0.01], relative: -15.4829550093076 [1.00001], message: "numbers have different signs and difference is not small"
203: : failed

therefore I was wondering which version of Eigen exactly we are using, it seems from here http://sourceforge.net/projects/open-ms/files/contrib/ that we are using 3.2.0 and debian is using 3.2.1. In addition there is a very interesting bugfix-patch that debian applies to Eigen which is also not present in our release (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=736985)

Description: Fix Spline Module
Author: Philipp Büttgenbach philipp.buettgenbach@gmail.com>
Reviewed-By: Anton Gladky <gladk@debian.org>
Bug-Debian: http://bugs.debian.org/736985
Last-Update: 2014-01-30

--- a/unsupported/Eigen/src/Splines/SplineFwd.h
+++ b/unsupported/Eigen/src/Splines/SplineFwd.h
@@ -67,7 +67,7 @@
       typedef Array<_Scalar,Dynamic,Dynamic,RowMajor,NumOfDerivativesAtCompileTime,OrderAtCompileTime> BasisDerivativeType;

       /** \brief The data type used to store the spline's derivative values. */      
-      typedef Array<_Scalar,_Dim,Dynamic,ColMajor,_Dim,NumOfDerivativesAtCompileTime> DerivativeType;
+      typedef Array<_Scalar,_Dim,Dynamic,RowMajor,_Dim,NumOfDerivativesAtCompileTime> DerivativeType;
     };

     /** \brief 2D float B-spline with dynamic degree. */

Next I tried a fresh download from http://eigen.tuxfamily.org/index.php?title=News:Eigen_3.2.1_released! and used all of our own contrib libraries with the same result as the debian system library. So it does not seem to be the above debian fix but rather a difference between Eigen 3.2.1 and Eigen 3.2.0

Also I wonder why none of the tests for the decoy distribution failed since they seem to use the Gamma distribution fitter internally to fit the false distribution

It seems that the result of the fit was added in 2009 to the test 2d77b9b5e5afdeb86ce47ba69dcd804968a8b895 and not changed since - however we have no way to verify which result is correct except that the previous Eigen and GSL agreed and suddenly the new, Eigen 3.2.1 do not seem to agree any more.

Can somebody

aiche commented 10 years ago

mac osx shows the same problem with the GammaDistributionFitter_test (eigen 3.2.0 contrib vs eigen 3.2.1 homebrew)

aiche commented 10 years ago

After some debugging it seems that some changes between eigen 3.2.0 and 3.2.1 have affected parts of the the LM used in the GammaDistributionFitter without actually touching it. The weird thing is, that the GammaDistributionFitter actually produces a lot of NaNs while fitting the test data. So not sure if this is even valid what we are trying to do. I will try to get some more details ..

hroest commented 10 years ago

Did.we consider that our test.data may be unsuitable? Maybe also check our testdata and whether it really is testing what we think it is. I don't really understand why all other tests involving the fitting work except the unit test. specifically what happens when a larger dataset is used?

3.2.1 was supposed to be bugfix only, right?

Hannes

sent from my smartphone -- you may keep any typos you find. On Jun 11, 2014 4:39 PM, "Stephan Aiche" notifications@github.com wrote:

After some debugging it seems that some changes between eigen 3.2.0 and 3.2.1 have affected parts of the the LM used in the GammaDistributionFitter without actually touching it. The weird thing is, that the GammaDistributionFitter actually produces a lot of NaNs while fitting the test data. So not sure if this is even valid what we are trying to do. I will try to get some more details ..

— Reply to this email directly or view it on GitHub.

aiche commented 10 years ago

gammadistfit_testdata test data isn't optimal but should work, the problem is that the gamma distribution seems to be only defined for positive parameters but the optimisation temporarily switches to negative values which leads to a more or less undefined state

hroest commented 10 years ago

@aiche: why are there so many points at zero? the fit does not seem to be perfect, is it due to these extra values?

aiche commented 10 years ago

why are there so many points at zero?

don't know

the fit does not seem to be perfect, is it due to these extra values?

possible, given the zero values it could still be the best (in the least-squares sense) fit

can we prevent negative values in the optimization step, e.g. during evaluation of the function or computation of the gradients ?

not really, we could only abort the optimization if the values get negative. the LM implementation in eigen (and also gsl) don't support constrains on the parameters

hroest commented 10 years ago

does the fit work (at least not fail) if you delete all the points at zero?

On 12 June 2014 20:55, Stephan Aiche notifications@github.com wrote:

why are there so many points at zero?

don't know

the fit does not seem to be perfect, is it due to these extra values?

possible, given the zero values it could still be the best (in the least-squares sense) fit

can we prevent negative values in the optimization step, e.g. during evaluation of the function or computation of the gradients ?

not really, we could only abort the optimization if the values get negative. the LM implementation in eigen (and also gsl) don't support constrains on the parameters

— Reply to this email directly or view it on GitHub https://github.com/OpenMS/OpenMS/issues/861#issuecomment-45933261.

aiche commented 10 years ago

removing all the zeros still gives negative parameters .. so this won't work. Changing the initial parameters actually avoids going into the negative values, but this is only a fix for the test and not for the underlying problem