BhallaLab / moose-core

C++ basecode and python scripting interface
https://moose.ncbs.res.in
GNU General Public License v3.0
15 stars 26 forks source link

Segault on multithreaded branch (reported by rahul g, on issue #124) #126

Closed dilawar closed 8 years ago

dilawar commented 8 years ago

Reported by @rahulgayatri23

On nargis, moose does not compile unless the MPI flag is set. Otherwise it gives the following error.

In file included from /cluster/share/software/hdf51813/include/hdf5.h:24,
from /home/rahulg/Work/moose-core-boost/moose-core/builtins/InputVariable.cpp:50:
/cluster/share/software/hdf51813/include/H5public.h:61:20: error: mpi.h: No such file or directory
/cluster/share/software/hdf51813/include/H5public.h:63:21: error: mpio.h: No such file or directory
In file included from /cluster/share/software/hdf51813/include/H5FDmpi.h:59,
from /cluster/share/software/hdf51813/include/hdf5.h:46,
from /home/rahulg/Work/moose-core-boost/moose-core/builtins/InputVariable.cpp:50:
/cluster/share/software/hdf51813/include/H5FDmpio.h:51: error: 'MPI_Comm' has not been declared
/cluster/share/software/hdf51813/include/H5FDmpio.h:51: error: 'MPI_Info' has not been declared
/cluster/share/software/hdf51813/include/H5FDmpio.h:52: error: 'MPI_Comm' has not been declared
/cluster/share/software/hdf51813/include/H5FDmpio.h:53: error: 'MPI_Info' has not been declared
make[2]: *** [builtins/CMakeFiles/moose_builtins.dir/InputVariable.cpp.o] Error 1
make[1]: *** [builtins/CMakeFiles/moose_builtins.dir/all] Error 2
make: *** [all] Error 2

And also when I set the MPI flag, it gives a the following segmentation fault when I run this script.

[nargis:114519] Signal: Segmentation fault (11)
[nargis:114519] Signal code: Address not mapped (1)
[nargis:114519] Failing at address: 0x88
[nargis:114519] [ 0] /lib64/libpthread.so.0() [0x3d8640f500]
[nargis:114519] [ 1] /home/rahulg/Work/moose-core-boost/moose-core/build/python/moose/_moose.so(_ZN2mu17ParserTokenReader6ReInitEv+0xd) [0x7ff0efe87e3d]
[nargis:114519] [ 2] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZNK2mu10ParserBase6ReInitEv+0x9b) [0x7ff0efe8cf3b]
[nargis:114519] [ 3] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN2mu10ParserBase9DefineVarERKSsPd+0x253) [0x7ff0efe913f3]
[nargis:114519] [ 4] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN8FuncTerm16setReactantIndexERKSt6vectorIjSaIjEE+0x1dc) [0x7ff0f01a959c]
[nargis:114519] [ 5] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN8FuncTermaSERKS+0x59) [0x7ff0f01a9d39]
[nargis:114519] [ 6] /home/rahulg/Work/moose-core-boost/moose-core/build/python/moose/_moose.so(_ZNK8FuncReac18copyWithVolScalingEddd+0x212) [0x7ff0f01c1de2]
[nargis:114519] [ 7] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN14GssaVoxelPools18updateAllRateTermsERKSt6vectorIP8RateTermSaIS2_EEj+0xf8) [0x7ff0f01a6958]
[nargis:114519] [ 8] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN6Gsolve15updateRateTermsEj+0x145) [0x7ff0f01d0b05]
[nargis:114519] [ 9] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN6Stoich8setElistERK4ErefRKSt6vectorI5ObjIdSaIS4_EE+0x2e1) [0x7ff0f01bda71]
[nargis:114519] [10] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN6Stoich7setPathERK4ErefSs+0xe1) [0x7ff0f01bdca1]
[nargis:114519] [11] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZNK7EpFunc1I6StoichSsE2opERK4ErefSs+0x68) [0x7ff0f01becc8]
[nargis:114519] [12] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_ZN7SetGet1ISsE3setERK5ObjIdRKSsSs+0x1d1) [0x7ff0efd3c531]
[nargis:114519] [13] /home/rahulg/Work/moose-core-boost/moose-core/_build/python/moose/_moose.so(_Z20moose_ObjId_setattroP6_ObjIdP7_objectS2+0x13a9) [0x7ff0efd2de79]
[nargis:114519] [14] /usr/local/lib/libpython2.7.so.1.0(PyObject_SetAttr+0x87) [0x7ff0f9914077]
[nargis:114519] [15] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x292d) [0x7ff0f997592d]
[nargis:114519] [16] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x88e) [0x7ff0f997a4ae]
[nargis:114519] [17] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x565a) [0x7ff0f997865a]
[nargis:114519] [18] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x88e) [0x7ff0f997a4ae]
[nargis:114519] [19] /usr/local/lib/libpython2.7.so.1.0(+0x77778) [0x7ff0f98f8778]
[nargis:114519] [20] /usr/local/lib/libpython2.7.so.1.0(PyObject_Call+0x53) [0x7ff0f98c91a3]
[nargis:114519] [21] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x414a) [0x7ff0f997714a]
[nargis:114519] [22] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCodeEx+0x88e) [0x7ff0f997a4ae]
[nargis:114519] [23] /usr/local/lib/libpython2.7.so.1.0(PyEval_EvalCode+0x32) [0x7ff0f997a5c2]
[nargis:114519] [24] /usr/local/lib/libpython2.7.so.1.0(PyRun_FileExFlags+0xb0) [0x7ff0f999a2a0]
[nargis:114519] [25] /usr/local/lib/libpython2.7.so.1.0(PyRun_SimpleFileExFlags+0xef) [0x7ff0f999a47f]
[nargis:114519] [26] /usr/local/lib/libpython2.7.so.1.0(Py_Main+0xc74) [0x7ff0f99afc44]
[nargis:114519] [27] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3d8581ecdd]
[nargis:114519] [28] python() [0x400649]
[nargis:114519] *** End of error message **
dilawar commented 8 years ago

Unfortunately default hdf library installed on server is libhdf5-openmp ; which will not work without -DWITH_MPI=ON. To bypass this, you need to point to serial version of hdf5 before you run cmake. You can set the environment variable HDF5_ROOT to point to your version of hdf5.

HDF5_ROOT=/cluster/share/software/hdf51813 cmake .. 

Re-open the ticket, if problem continues.

dilawar commented 8 years ago

I see. The /cluster/share/software/hdf51813 is also compiled with openmpi support. You need to compile a version by yourself. I've put up one version in my $HOME. Try the following:

HDF5_ROOT=/home/dilawar/bin cmake ..
rgayatri23 commented 8 years ago

Hi Dilawar, I do not have the permission to access your directory. But I installed a version of HDF5 in my own directory and passing that to cmake. But when I build it (do a make) , I get the following error libhdf5.so.10: undefined reference to SZ_BufftoBuffDecompress' libhdf5.so.10: undefined reference toSZ_BufftoBuffCompress' libhdf5.so.10: undefined reference to `SZ_encoder_enabled'

It seems that this might happen if the library is not linked correctly with the lhdf5 flag. I have done that too, but then it gives an error saying " cannot find -lhdf5 " Any ideas, where I may be going wrong?

dilawar commented 8 years ago

Its hard to say anything without cmake/make logs. I managed to get it working with hdf-1.8.16 source code.

rgayatri23 commented 8 years ago

Fixed the linking issue by adding the HDF5_root/lib path to rpath.

rgayatri23 commented 8 years ago

Fixed, the segfault issue with the script too. It needed a higher version of gcc than what is present on Nargis. Installed a gcc-5.2 in my local user account. And now it works!!!

On Wed, Jun 1, 2016 at 5:11 PM, Rahul Gayatri rahulgayatri84@gmail.com wrote:

Hi, sorry. Fixed the issue by adding the HDF5_root/lib path to rpath.

On Wed, Jun 1, 2016 at 5:03 PM, Dilawar Singh notifications@github.com wrote:

Please add some information what fixed the issue, even if it is trivial, when reporting on issues. Otherwise the update is useless to anyone.

On Wed, Jun 1, 2016 at 4:38 PM rahulgayatri23 notifications@github.com wrote:

HI DIlawar, Fixed the issue, some problems with the library. But still the issue of seg fault with gsolve is not solved.

On Wed, Jun 1, 2016 at 2:29 PM, Dilawar Singh <notifications@github.com

wrote:

Its hard to say anything without cmake/make logs. I managed to get it working with hdf-1.8.16 source code.

On Wed, Jun 1, 2016 at 1:03 PM, rahulgayatri23 < notifications@github.com

wrote:

Hi Dilawar, I do not have the permission to access your directory. But I installed a version of HDF5 in my own directory and passing that to cmake. But when I build it (do a make) , I get the following error libhdf5.so.10: undefined reference to SZ_BufftoBuffDecompress' libhdf5.so.10: undefined reference toSZ_BufftoBuffCompress' libhdf5.so.10: undefined reference to `SZ_encoder_enabled'

It seems that this might happen if the library is not linked correctly with the lhdf5 flag. I have done that too, but then it gives an error saying " cannot find -lhdf5 " Any ideas, where I may be going wrong?

Regards, Rahul.

On Wed, Jun 1, 2016 at 11:28 AM, Dilawar Singh < notifications@github.com

wrote:

Reopened #126 <https://github.com/BhallaLab/moose-core/issues/126 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/BhallaLab/moose-core/issues/126#event-677965230 , or mute the thread <

https://github.com/notifications/unsubscribe/APfETsIdm-b3ZLoxy1lLWl_gMFJi_hZNks5qHR-fgaJpZM4IrKvC

.

With Kind Regards G.Rahul Kumar

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <

https://github.com/BhallaLab/moose-core/issues/126#issuecomment-222915452

, or mute the thread <

https://github.com/notifications/unsubscribe/AA2qwYmU3S8GV1GmBZO9JVOq8WMVWSXQks5qHTXEgaJpZM4IrKvC

.

Dilawar NCBS Bangalore

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <

https://github.com/BhallaLab/moose-core/issues/126#issuecomment-222934000

, or mute the thread <

https://github.com/notifications/unsubscribe/APfETqZPpxDa8--usLxNGn1-E0_Z8aNAks5qHUoLgaJpZM4IrKvC

.

With Kind Regards G.Rahul Kumar

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub < https://github.com/BhallaLab/moose-core/issues/126#issuecomment-222961796 , or mute the thread < https://github.com/notifications/unsubscribe/AA2qwU-_p9M4ZliwZA_K9vCD02jvoV-eks5qHWhEgaJpZM4IrKvC

.

Dilawar

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BhallaLab/moose-core/issues/126#issuecomment-222966585, or mute the thread https://github.com/notifications/unsubscribe/APfEThF4YMbeWJWPN5f6SIeCitrmFt0fks5qHW4mgaJpZM4IrKvC .

With Kind Regards G.Rahul Kumar

With Kind Regards G.Rahul Kumar