PolyChord / PolyChordLite

Public version of PolyChord: See polychord.co.uk for PolyChordPro
https://polychord.io/
Other
83 stars 26 forks source link

Error linking with C++ code #37

Closed qminor closed 4 years ago

qminor commented 4 years ago

I installed PolyChord on a new cluster and, after linking with my C++ code, I get a "floating point exception" when run_polychord(...) is called. It generates the live points, and after it says "started sampling", it crashes with the following output:

[login-0-0:08504] Process received signal [login-0-0:08504] Signal: Floating point exception (8) [login-0-0:08504] Signal code: Integer divide-by-zero (1) [login-0-0:08504] Failing at address: 0x7f2212d0837b [login-0-0:08504] [ 0] /lib64/libpthread.so.0(+0xf5e0)[0x7f2211fdb5e0] [login-0-0:08504] [ 1] /scratch/quinn.minor/PolyChord/lib/libchord.so(random_module_mp_random_orthonormalbases+0xcb)[0x7f2212d0837b] [login-0-0:08504] [ 2] /scratch/quinn.minor/PolyChord/lib/libchord.so(chordal_module_mp_generatenhats+0x70d)[0x7f2212d0784d] [login-0-0:08504] [ 3] /scratch/quinn.minor/PolyChord/lib/libchord.so(chordal_module_mpslicesampling+0x784)[0x7f2212d04d14] [login-0-0:08504] [ 4] /scratch/quinn.minor/PolyChord/lib/libchord.so(nested_sampling_module_mpnestedsampling+0x3a34)[0x7f2212cee634] [login-0-0:08504] [ 5] /scratch/quinn.minor/PolyChord/lib/libchord.so(interfaces_module_mp_run_polychordfull+0x1f5)[0x7f2212d4c065] [login-0-0:08504] [ 6] /scratch/quinn.minor/PolyChord/lib/libchord.so(polychord_c_interface+0xa49)[0x7f2212d495d9] [login-0-0:08504] [ 7] /scratch/quinn.minor/PolyChord/lib/libchord.so(_Z13run_polychordPFdPdiS_iEPFvS_S_iEPFviiiS_S_S_ddE8SettingsRP19ompi_communicator_t+0x26e)[0x7f2212d48b5e] [login-0-0:08504] [ 8] /scratch/quinn.minor/PolyChord/lib/libchord.so(_Z13run_polychordPFdPdiS_iEPFvS_S_iEPFviiiS_S_S_ddE8Settings+0x167)[0x7f2212d4de07] [login-0-0:08504] [ 9] qlens[0x5688aa] [login-0-0:08504] [10] qlens[0x48baab] [login-0-0:08504] [11] qlens[0x40b2b4] [login-0-0:08504] [12] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f2211c2ac05] [login-0-0:08504] [13] qlens[0x40a36f] [login-0-0:08504] End of error message Floating point exception

In case it helps, the modules I have loaded (and used when compiling PolyChord) are as follows: 1) libblas/3.7.0_gnu 2) intel/17.0.2(default) 3) openmpi/1.8.4_intel 4) readline/6.3_gnu(default)

If you have any idea as to what the problem might be, I'd love to hear it. I should mention that my code works great with PolyChord on another computing cluster, so there's something going wrong with the libraries being used (either when compiling/linking or at runtime). Thanks in advance for your help.

williamjameshandley commented 4 years ago

Hi @qminor. Sorry to hear this. Are you able to recompile with debugging flags on:

make veryclean
make DEBUG=1 <myprogram>

and see if that error message is more helpful? This will slow things down a little, so you might want to consider lowering the number of live points to a smaller number (~10) if your likelihood is very slow.

qminor commented 4 years ago

Ok using the debugger helped me pinpoint the problem. The problem was that the C++ vectors grade_frac, grade_dim, loglikes and nlives weren't converting into their Fortran counterparts well. The issue was with the way they were initialized in c_interface.cpp. I fixed this by entering the following into the main body of the constructor for Settings: { grade_frac.resize(1); grade_frac[0] = 1; grade_dims.resize(1); grade_dims[0] = nDims; loglikes.resize(0); nlives.resize(0); } I also commented out the lines before the brackets where they were being initialized before. This seems to work just fine now. Apparently there was a version issue where the compiler had trouble with the way they were being initialized.

Thanks for the tip, it seems like all is well now! If you see any potential problems ahead with the way I've initialized them above, please let me know. I don't have any real understanding of what those vectors are for, or if there would ever be a reason to initialize them differently.

Thanks, Quinn

williamjameshandley commented 4 years ago

Good to hear this is solved, although it does seem a little odd that this extra initialisation is required. Which compilers were you using?

qminor commented 4 years ago

The compilers were from Intel version 17.0.2. I compiled my code with the C++ compiler (mpicxx) and linked it with Polychord (using the -lchord argument). This is definitely a version-specific issue because it compiled successfully on another cluster without having to change the initialization of the vector objects.

williamjameshandley commented 4 years ago

Good to know. Many thanks!