make check fails at test_bt.sh and eigen 3.3.1

GenABEL-Project / ProbABEL

Tool for genome-wide association analysis of imputed genetic data.

7 stars 4 forks source link

make check fails at test_bt.sh and eigen 3.3.1 #37

Open maarten-k opened 7 years ago

maarten-k commented 7 years ago

make check fails when I try compile ProbABEL against EIGEN 3.3.1.

I looks like a simple rounding error, but since the data is the same (only file format changes) it is at least a bit odd.

2c2
< rs7247199 GAC A 0.5847 0.415 0.9299 0.8666 200 0.333517 19 204938 544.518 437.583 -567.388 435.04 1.75239
---
> rs7247199 GAC A 0.5847 0.415 0.9299 0.8666 200 0.333517 19 204938 544.518 437.583 -567.387 435.04 1.75239
BT check (2df model): prob vs. prob_fv                                 FAILED

To reproduce run from within the ProABEL folder:

wget https://bitbucket.org/eigen/eigen/get/3.3.1.tar.bz2
tar -xf 3.3.1.tar.bz2
./configure --with-eigen-include-path=$(pwd)/eigen-eigen-f562a193118d/ --disable-latex-doc  &&make clean && make -j 4 && make check
/checks/test_bt.sh verbose

lckarssen commented 7 years ago

Just to be clear, this is in the "internal validation" checks, not the ones where the ProbABEL output is compared to R's output, right? (The case of comparing against R is handled in issue #11.)

And I assume it's with v0.5.0, correct?

maarten-k commented 7 years ago

I checked out the master branch.. It works fine with EIGEN 3.2.X but fails with 3.3.1. Issue #11 seems not to occur with EIGEN 3.3.1

lckarssen commented 7 years ago

In the past we have regularly run into this kind of issues with rounding differences between plain text files and filevector files. The main problem has to do with the conversion of either strings to doubles (in plain text files) and floats to doubles (stored binary in filevector files). Internally all calculations are performed on doubles, but imputed data is only accurate up to a few (two, three) significant digits and can therefore easily be stored in floats.

I'm not sure why these rounding errors show up now and then in the sixth decimal. Is it a real rounding error in the calculation or a problem caused by reading or printing the number from/to file (using C++'s iostream library)?

For the output we currently hard code the precision to 6, because that's the limit of a float. Maybe using std::numeric_limits::digits10 is more appropriate, but I'm not sure if it would change things.

maarten-k commented 7 years ago

In terms of statistics or biology this kind of deviation is negligible. Since the tests in test_bt.sh are also with a test compare to R ( @lckarssen are they?), it is safe to assume everything works fine. The question is how to confidence our test without making them less sensitive.

lckarssen commented 7 years ago

About biology/statistics: that's true (although ideally I would like to have the numbers equal because we calculate at a much higher precision internally).

The R comparison tests are done in separate scripts (see the checks/R-tests/ directory).

lckarssen commented 7 years ago

Just a quick note to correct @maarten-k's statement: Issue #11 does occur also with Eigen v3.3.1.