Intermittent failures in test_config_gsobject.py on a 32 bit linux machine

barnabytprowe commented 11 years ago

Hi all,

After a thorough rm -f .scon* and scons -c etc. before running scons, I get intermittent "high-precision failures" (if that makes sense) in test comparisons in test_config_gsobject.py on an older, 32-bit linux machine. This is my issue in which to sort this out (I volunteer): it looks to me like one of those cases where we need to fix a random number seed to make results strictly repeatable.

Here is some example output from running scons tests six times on the offending 32 bit system:

kishar1% scons tests; scons tests; scons tests; scons tests; scons tests; scons tests
scons: Reading SConscript files ...
SCons is version 2.0.1 using python version 2.7.2
Python is from /usr/local/EPD/epd-7.1.2/include/python2.7
Using the following (non-default) scons options:
   CXX = /usr/bin/g++
   TMV_DIR = /home/browe/local/32
   BOOST_DIR = /home/browe/local/32
These can be edited directly in the file gs_scons.conf.
Type scons -h for a full list of available options.
Using python =  /usr/bin/env python
Using default PYPREFIX =  /usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages
Using compiler: /usr/bin/g++
compiler version: 4.4.6
Determined that a good number of jobs = 2
Checking for C++ header file fftw3.h... (cached) yes
Checking for correct FFTW linkage... (cached) yes
Checking for C++ header file boost/shared_ptr.hpp... (cached) yes
Checking for C++ header file TMV.h... (cached) yes
Using TMV_LINK file: /home/browe/local/32/share/tmv/tmv-link
     -L/home/browe/local/32/lib -ltmv -lpthread -Wl,-rpath=/home/browe/local/32/lib -fopenmp
Checking for correct TMV linkage... (this may take a little while)
Checking for correct TMV linkage... (cached) yes
Checking if we can build against Python... (cached) yes
Checking if we can build module using TMV... (cached) yes
Checking if we can build against NumPy... (cached) yes
Checking for PyFITS... (cached) yes
Checking if we can build against Boost.Python... (cached) yes
Checking if C++ exceptions are propagated up to python... (cached) yes
nosetests version: 1.0.0
Found static library:  /usr/local/lib/libfftw3.a
scons: done reading SConscript files.
scons: Building targets ...
/usr/bin/g++ -o tests/.obj/test_main.o -c -O2 -fno-strict-aliasing -Wall -Werror -Iinclude -I/home/browe/local/32/include tests/test_main.cpp
/usr/bin/g++ -o tests/.obj/test_Image.o -c -O2 -fno-strict-aliasing -Wall -Werror -Iinclude -I/home/browe/local/32/include tests/test_Image.cpp
/usr/bin/g++ -o tests/.obj/test_integ.o -c -O2 -fno-strict-aliasing -Wall -Werror -Iinclude -I/home/browe/local/32/include tests/test_integ.cpp
/usr/bin/g++ -o bin/test_main -fopenmp tests/.obj/test_main.o tests/.obj/test_Image.o tests/.obj/test_integ.o -Llib -L/home/browe/local/32/lib -lgalsim -ltmv_symband -lfftw3 -lpthread -ltmv -lpthread
run_tests(["tests/tests.log"], ["bin/test_main"])
Using nosetests from:  /usr/local/EPD/epd-7.1.2/bin/nosetests
nosetests is version 1.0.0

Starting python tests...
.....................................................................................................................................................................................
----------------------------------------------------------------------
Ran 181 tests in 221.992s

OK
Nosetests finished successfully.

Starting cpp tests...
bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object file: No such file or directory
test_main returned error code  127

scons: done building targets.
scons: Reading SConscript files ...
SCons is version 2.0.1 using python version 2.7.2
Python is from /usr/local/EPD/epd-7.1.2/include/python2.7
Using the following (non-default) scons options:
   CXX = /usr/bin/g++
   TMV_DIR = /home/browe/local/32
   BOOST_DIR = /home/browe/local/32
These can be edited directly in the file gs_scons.conf.
Type scons -h for a full list of available options.
Using python =  /usr/bin/env python
Using default PYPREFIX =  /usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages
Using compiler: /usr/bin/g++
compiler version: 4.4.6
Determined that a good number of jobs = 2
Checking for C++ header file fftw3.h... (cached) yes
Checking for correct FFTW linkage... (cached) yes
Checking for C++ header file boost/shared_ptr.hpp... (cached) yes
Checking for C++ header file TMV.h... (cached) yes
Using TMV_LINK file: /home/browe/local/32/share/tmv/tmv-link
     -L/home/browe/local/32/lib -ltmv -lpthread -Wl,-rpath=/home/browe/local/32/lib -fopenmp
Checking for correct TMV linkage... (this may take a little while)
Checking for correct TMV linkage... (cached) yes
Checking if we can build against Python... (cached) yes
Checking if we can build module using TMV... (cached) yes
Checking if we can build against NumPy... (cached) yes
Checking for PyFITS... (cached) yes
Checking if we can build against Boost.Python... (cached) yes
Checking if C++ exceptions are propagated up to python... (cached) yes
nosetests version: 1.0.0
Found static library:  /usr/local/lib/libfftw3.a
scons: done reading SConscript files.
scons: Building targets ...
run_tests(["tests/tests.log"], ["bin/test_main"])
Using nosetests from:  /usr/local/EPD/epd-7.1.2/bin/nosetests
nosetests is version 1.0.0

Starting python tests...
.....................................................................................................................................................................................
----------------------------------------------------------------------
Ran 181 tests in 225.751s

OK
Nosetests finished successfully.

Starting cpp tests...
bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object file: No such file or directory
test_main returned error code  127

scons: done building targets.
scons: Reading SConscript files ...
SCons is version 2.0.1 using python version 2.7.2
Python is from /usr/local/EPD/epd-7.1.2/include/python2.7
Using the following (non-default) scons options:
   CXX = /usr/bin/g++
   TMV_DIR = /home/browe/local/32
   BOOST_DIR = /home/browe/local/32
These can be edited directly in the file gs_scons.conf.
Type scons -h for a full list of available options.
Using python =  /usr/bin/env python
Using default PYPREFIX =  /usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages
Using compiler: /usr/bin/g++
compiler version: 4.4.6
Determined that a good number of jobs = 2
Checking for C++ header file fftw3.h... (cached) yes
Checking for correct FFTW linkage... (cached) yes
Checking for C++ header file boost/shared_ptr.hpp... (cached) yes
Checking for C++ header file TMV.h... (cached) yes
Using TMV_LINK file: /home/browe/local/32/share/tmv/tmv-link
     -L/home/browe/local/32/lib -ltmv -lpthread -Wl,-rpath=/home/browe/local/32/lib -fopenmp
Checking for correct TMV linkage... (this may take a little while)
Checking for correct TMV linkage... (cached) yes
Checking if we can build against Python... (cached) yes
Checking if we can build module using TMV... (cached) yes
Checking if we can build against NumPy... (cached) yes
Checking for PyFITS... (cached) yes
Checking if we can build against Boost.Python... (cached) yes
Checking if C++ exceptions are propagated up to python... (cached) yes
nosetests version: 1.0.0
Found static library:  /usr/local/lib/libfftw3.a
scons: done reading SConscript files.
scons: Building targets ...
run_tests(["tests/tests.log"], ["bin/test_main"])
Using nosetests from:  /usr/local/EPD/epd-7.1.2/bin/nosetests
nosetests is version 1.0.0

Starting python tests...
.........................................................................................FF..........................................................................................
======================================================================
FAIL: Test various ways to build a Convolve
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/home/browe/great3/32/GalSim/tests/test_config_gsobject.py", line 1031, in test_convolve
    gsobject_compare(gal5a, gal5b)
  File "/home/browe/great3/32/GalSim/tests/test_config_gsobject.py", line 46, in gsobject_compare
    np.testing.assert_array_almost_equal(im1.array, im2.array, 10)
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 10 decimals

(mismatch 100.0%)
 x: array([[ 1.56958869,  1.67352826,  1.77114509,  1.85923352,  1.93456855,
         1.9941418 ,  2.03541117,  2.05653134,  2.05653134,  2.03541117,
         1.9941418 ,  1.93456855,  1.85923352,  1.77114509,  1.67352826,...
 y: array([[ 1.56959898,  1.6735382 ,  1.77115473,  1.85924292,  1.93457776,
         1.99415088,  2.03542015,  2.05654028,  2.05654028,  2.03542015,
         1.99415088,  1.93457776,  1.85924292,  1.77115473,  1.6735382 ,...

======================================================================
FAIL: Test building a GSObject from a list:
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/home/browe/great3/32/GalSim/tests/test_config_gsobject.py", line 1115, in test_list
    gsobject_compare(gal5a, gal5b, conv=galsim.Gaussian(sigma=1))
  File "/home/browe/great3/32/GalSim/tests/test_config_gsobject.py", line 46, in gsobject_compare
    np.testing.assert_array_almost_equal(im1.array, im2.array, 10)
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 10 decimals

(mismatch 100.0%)
 x: array([[ 1.56958869,  1.67352826,  1.77114509,  1.85923352,  1.93456855,
         1.9941418 ,  2.03541117,  2.05653134,  2.05653134,  2.03541117,
         1.9941418 ,  1.93456855,  1.85923352,  1.77114509,  1.67352826,...
 y: array([[ 1.56959898,  1.6735382 ,  1.77115473,  1.85924292,  1.93457776,
         1.99415088,  2.03542015,  2.05654028,  2.05654028,  2.03542015,
         1.99415088,  1.93457776,  1.85924292,  1.77115473,  1.6735382 ,...

----------------------------------------------------------------------
Ran 181 tests in 223.949s

FAILED (failures=2)
Nosetests returned error code  1

Starting cpp tests...
bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object file: No such file or directory
test_main returned error code  127

scons: done building targets.
scons: Reading SConscript files ...
SCons is version 2.0.1 using python version 2.7.2
Python is from /usr/local/EPD/epd-7.1.2/include/python2.7
Using the following (non-default) scons options:
   CXX = /usr/bin/g++
   TMV_DIR = /home/browe/local/32
   BOOST_DIR = /home/browe/local/32
These can be edited directly in the file gs_scons.conf.
Type scons -h for a full list of available options.
Using python =  /usr/bin/env python
Using default PYPREFIX =  /usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages
Using compiler: /usr/bin/g++
compiler version: 4.4.6
Determined that a good number of jobs = 2
Checking for C++ header file fftw3.h... (cached) yes
Checking for correct FFTW linkage... (cached) yes
Checking for C++ header file boost/shared_ptr.hpp... (cached) yes
Checking for C++ header file TMV.h... (cached) yes
Using TMV_LINK file: /home/browe/local/32/share/tmv/tmv-link
     -L/home/browe/local/32/lib -ltmv -lpthread -Wl,-rpath=/home/browe/local/32/lib -fopenmp
Checking for correct TMV linkage... (this may take a little while)
Checking for correct TMV linkage... (cached) yes
Checking if we can build against Python... (cached) yes
Checking if we can build module using TMV... (cached) yes
Checking if we can build against NumPy... (cached) yes
Checking for PyFITS... (cached) yes
Checking if we can build against Boost.Python... (cached) yes
Checking if C++ exceptions are propagated up to python... (cached) yes
nosetests version: 1.0.0
Found static library:  /usr/local/lib/libfftw3.a
scons: done reading SConscript files.
scons: Building targets ...
run_tests(["tests/tests.log"], ["bin/test_main"])
Using nosetests from:  /usr/local/EPD/epd-7.1.2/bin/nosetests
nosetests is version 1.0.0

Starting python tests...
.....................................................................................................................................................................................
----------------------------------------------------------------------
Ran 181 tests in 226.969s

OK
Nosetests finished successfully.

Starting cpp tests...
bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object file: No such file or directory
test_main returned error code  127

scons: done building targets.
scons: Reading SConscript files ...
SCons is version 2.0.1 using python version 2.7.2
Python is from /usr/local/EPD/epd-7.1.2/include/python2.7
Using the following (non-default) scons options:
   CXX = /usr/bin/g++
   TMV_DIR = /home/browe/local/32
   BOOST_DIR = /home/browe/local/32
These can be edited directly in the file gs_scons.conf.
Type scons -h for a full list of available options.
Using python =  /usr/bin/env python
Using default PYPREFIX =  /usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages
Using compiler: /usr/bin/g++
compiler version: 4.4.6
Determined that a good number of jobs = 2
Checking for C++ header file fftw3.h... (cached) yes
Checking for correct FFTW linkage... (cached) yes
Checking for C++ header file boost/shared_ptr.hpp... (cached) yes
Checking for C++ header file TMV.h... (cached) yes
Using TMV_LINK file: /home/browe/local/32/share/tmv/tmv-link
     -L/home/browe/local/32/lib -ltmv -lpthread -Wl,-rpath=/home/browe/local/32/lib -fopenmp
Checking for correct TMV linkage... (this may take a little while)
Checking for correct TMV linkage... (cached) yes
Checking if we can build against Python... (cached) yes
Checking if we can build module using TMV... (cached) yes
Checking if we can build against NumPy... (cached) yes
Checking for PyFITS... (cached) yes
Checking if we can build against Boost.Python... (cached) yes
Checking if C++ exceptions are propagated up to python... (cached) yes
nosetests version: 1.0.0
Found static library:  /usr/local/lib/libfftw3.a
scons: done reading SConscript files.
scons: Building targets ...
run_tests(["tests/tests.log"], ["bin/test_main"])
Using nosetests from:  /usr/local/EPD/epd-7.1.2/bin/nosetests
nosetests is version 1.0.0

Starting python tests...
.....................................................................................................................................................................................
----------------------------------------------------------------------
Ran 181 tests in 225.114s

OK
Nosetests finished successfully.

Starting cpp tests...
bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object file: No such file or directory
test_main returned error code  127

scons: done building targets.
scons: Reading SConscript files ...
SCons is version 2.0.1 using python version 2.7.2
Python is from /usr/local/EPD/epd-7.1.2/include/python2.7
Using the following (non-default) scons options:
   CXX = /usr/bin/g++
   TMV_DIR = /home/browe/local/32
   BOOST_DIR = /home/browe/local/32
These can be edited directly in the file gs_scons.conf.
Type scons -h for a full list of available options.
Using python =  /usr/bin/env python
Using default PYPREFIX =  /usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages
Using compiler: /usr/bin/g++
compiler version: 4.4.6
Determined that a good number of jobs = 2
Checking for C++ header file fftw3.h... (cached) yes
Checking for correct FFTW linkage... (cached) yes
Checking for C++ header file boost/shared_ptr.hpp... (cached) yes
Checking for C++ header file TMV.h... (cached) yes
Using TMV_LINK file: /home/browe/local/32/share/tmv/tmv-link
     -L/home/browe/local/32/lib -ltmv -lpthread -Wl,-rpath=/home/browe/local/32/lib -fopenmp
Checking for correct TMV linkage... (this may take a little while)
Checking for correct TMV linkage... (cached) yes
Checking if we can build against Python... (cached) yes
Checking if we can build module using TMV... (cached) yes
Checking if we can build against NumPy... (cached) yes
Checking for PyFITS... (cached) yes
Checking if we can build against Boost.Python... (cached) yes
Checking if C++ exceptions are propagated up to python... (cached) yes
nosetests version: 1.0.0
Found static library:  /usr/local/lib/libfftw3.a
scons: done reading SConscript files.
scons: Building targets ...
run_tests(["tests/tests.log"], ["bin/test_main"])
Using nosetests from:  /usr/local/EPD/epd-7.1.2/bin/nosetests
nosetests is version 1.0.0

Starting python tests...
................................................................................F....................................................................................................
======================================================================
FAIL: Test various ways to build a Kolmogorov
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/nose/case.py", line 187, in runTest
    self.test(*self.arg)
  File "/home/browe/great3/32/GalSim/tests/test_config_gsobject.py", line 287, in test_kolmogorov
    gsobject_compare(gal5a, gal5b)
  File "/home/browe/great3/32/GalSim/tests/test_config_gsobject.py", line 46, in gsobject_compare
    np.testing.assert_array_almost_equal(im1.array, im2.array, 10)
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 800, in assert_array_almost_equal
    header=('Arrays are not almost equal to %d decimals' % decimal))
  File "/usr/local/EPD/epd-7.1.2/lib/python2.7/site-packages/numpy/testing/utils.py", line 636, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Arrays are not almost equal to 10 decimals

(mismatch 57.8125%)
 x: array([[  0.10430761,   0.13948009,   0.18636556,   0.24644364,
          0.31819374,   0.3944321 ,   0.46112018,   0.50076595,
          0.50076595,   0.46112018,   0.3944321 ,   0.31819374,...
 y: array([[  0.10430761,   0.13948009,   0.18636556,   0.24644364,
          0.31819374,   0.3944321 ,   0.46112018,   0.50076595,
          0.50076595,   0.46112018,   0.3944321 ,   0.31819374,...

----------------------------------------------------------------------
Ran 181 tests in 226.477s

FAILED (failures=1)
Nosetests returned error code  1

Starting cpp tests...
bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object file: No such file or directory
test_main returned error code  127

scons: done building targets.

One final thing is that I plan to continue to ignore the bin/test_main: error while loading shared libraries: libgalsim.so.0: cannot open shared object error. This has been happening on this system for a long time and I don't know why, nor right now do I think it is right to prioritise fixing (since it doesn't affect actual GalSim tasks)...

rmjarvis commented 11 years ago

This is puzzling. There is no random number usage in that function. My guess is that it might be from the parallel nosetests somehow, although I don't have ideas specifically about what the bug might be. But you could check whether you ever get an error when doing scons tests -j1.

barnabytprowe commented 11 years ago

This issue does indeed appear to be fixed by the code on #426. I ran scons tests 13 times with no failures, which assuming that failure is described by a Binomial distribution with probability p (and a flat prior on p) gives Prob(p <= 0.1) = 0.77, with the following posterior distribution pfail

(Don't worry, I didn't waste any actual time on this overanalysis, I have a little piece of code, reproduced below, which I wrote for another project that produces this output:

#!/usr/bin/env python

from sys import argv
import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

NTABLE = 10000 # Number of entries for tabulated p likelihood in range [0., 1.)

print "Welcome to pFail.py (Binomial p inference tool)"
print "usage: pFail.py nfails ntrials ptolerance"
print ""
if len(argv) != 4:
    exit(1)

nfails = int(argv[1])
ntrials = int(argv[2])
ptolerance = float(argv[3])

p = np.arange(NTABLE, dtype=float) / float(NTABLE)
prior = np.ones(NTABLE) / float(NTABLE) # uniform prior, could use another if desired

# Now tabulate likelihood function (starts as a list, then converted to array)...
# Handle first element (p = 0.) without scipy function, it complains!
if nfails == 0:
    likelihood = [1.,]
else:
    likelihood = [0.,]
# Calculate rest of likelihood using a list comprehension to be slightly quicker
restoflike = [scipy.stats.binom.pmf(nfails, ntrials, p[i]) for i in xrange(1, NTABLE)]
likelihood.extend(restoflike)
likelihood = np.array(likelihood)

# Normalize and get posterior from prior and likelihood
posterior = (prior * likelihood)
posterior /= posterior.sum()

# Calculate and print results
print "Maximum-Likelihood p estimate = "+str(float(nfails) / float(ntrials))
print "Bayesian expectation E(p) = "+str((posterior * p).sum())
print "Prob(p <= ptolerance) = "+str((posterior[p <= ptolerance]).sum())
print ""

plt.plot(np.arange(NTABLE) / float(NTABLE), posterior * float(NTABLE))
plt.xlabel('p')
plt.ylabel('pdf(p)')
plt.show()

Anyway, looks good to me Mike, we should close this when #426 is merged!)

rmjarvis commented 11 years ago

lol. Thanks Barney.

GalSim-developers / GalSim

Intermittent failures in test_config_gsobject.py on a 32 bit linux machine #422