OpenClovis / SAFplus-Availability-Scalability-Platform

Middleware that provides libraries, GUI, and code generator to design multi-node (clustered) applications that are highly available, redundant, and scalable. Provides sub-second node and application fault detection and failover, and useful application libraries including distributed hash tables (checkpoint), event, logging, and communications. Implements SA-Forum APIs where applicable. Used anywhere reliability is a must -- like telecom, wireless, defense and enterprise computing. Download stable release with installer from: ftp.openclovis.com
www.openclovis.com
GNU General Public License v2.0
19 stars 13 forks source link

Core dump is generated after calling saCkptFinalize #114

Closed hungta closed 10 years ago

hungta commented 10 years ago

Initializing checkpoint handle using saCkptInitialize().

After using the handle, finalizing it using saCkptFinalize(). After this call, a core dump appeared: gdb bin/safplus_ckpt var/run/core GNU gdb (GDB) 7.5.91.20130417-cvs-ubuntu Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /home/c3po/test/asp/bin/safplus_ckpt...done. [New LWP 12234] [New LWP 12237] [New LWP 12213] [New LWP 12230] [New LWP 12223] [New LWP 12217] [New LWP 12222] [New LWP 12233] [New LWP 12257] [New LWP 12232] [New LWP 12219] [New LWP 12231] [New LWP 12225] [New LWP 12238] [New LWP 12215] [New LWP 12258] [New LWP 12241] [New LWP 12229]

warning: Can't read pathname for load map: Input/output error. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffb4ade000 Core was generated by `/home/c3po/test/asp/bin/safplus_ckpt'. Program terminated with signal 11, Segmentation fault.

0 0x000000000041a693 in ckptSvrHdlDeleteCallback (userKey=, userData=0x7f0854000a08) at clCkptUtils.c:253

253 if(secMutex[i] != CL_HANDLE_INVALID_VALUE) (gdb) thr a a bt Thread 1 (Thread 0x7f0876811700 (LWP 12234)):

0 0x000000000041a693 in ckptSvrHdlDeleteCallback (userKey=, userData=0x7f0854000a08) at clCkptUtils.c:253

---Type to continue, or q to quit---

1 0x00007f0879f6c7c5 in cclLinkedListContainerNodeDelete (containerHandle=0x150f338, nodeHandle=)

at clovisLinkedList.c:586

2 0x00007f0879f6386d in clCntNodeDelete (containerHandle=containerHandle@entry=0x150f338, nodeHandle=)

at clovisContainer.c:249

3 0x00007f0879f670f1 in clCntNonUniqueKeyDelete (container=0x150f338, key=0x26756687,

givenData=givenData@entry=0x7f0854001088, cmp=0x41b628 <ckptHdlNonUniqueKeyCompare>) at clovisContainer.c:698

4 0x0000000000418b61 in clCkptSvrReplicaDelete (pCkpt=0x7f0854001088, ckptHdl=ckptHdl@entry=4516841011544071,

isActive=<optimized out>, isActive@entry=1) at clCkptSaf.c:4884

5 0x0000000000419041 in clCkptActiveCkptDelete_4_0_0 (version=..., ckptHdl=4516841011544071) at clCkptSaf.c:411

6 0x000000000043cc7b in clCkptActiveCkptDeleteServer_4_0_0 (eoData=, inMsgHdl=,

outMsgHdl=0x1395f20)
at /home/c3po/git-tests/6.1/safplus/src/SAFplus/../SAFplus/components/ckpt/idl/ckptEo/server/ckptEockptServerMasterActiveServer.c:1256

7 0x00007f087a5c90f1 in clRmdInvoke (func=0x43cbd1 , eoArg=0x0,

inMsgHdl=inMsgHdl@entry=0x7f0868002c70, outMsgHdl=outMsgHdl@entry=0x1395f20) at clRmdHandle.c:140

8 0x00007f087a7e20c9 in clEoWalkWithVersion (pThis=pThis@entry=0x14d7d38, func=27, version=version@entry=0x7f0876810c0d,

pFuncCallout=0x7f087a5c8ff0 <clRmdInvoke>, inMsgHdl=inMsgHdl@entry=0x7f0868002c70, outMsgHdl=0x1395f20) at eo.c:2578

9 0x00007f087a5cade4 in rmdHandleAsyncRequest (pThis=pThis@entry=0x14d7d38, pReq=pReq@entry=0x7f0876810d18,

srcAddr=srcAddr@entry=0x7f0876810d00, priority=priority@entry=0 '\000', inMsgHdl=0x7f0868002c70, 
protoType=<optimized out>) at clRmdRecv.c:424

10 0x00007f087a5cb78f in clRmdReceiveAsyncRequest (pThis=0x14d7d38, rmdRecvMsg=0x7f0868002c70, priority=,

protoType=<optimized out>, length=<optimized out>, srcAddr=...) at clRmdRecv.c:244

11 0x00007f087a7dd863 in clEoJobHandler (pJob=0x7f08680008c8) at eo.c:4148

12 0x00007f087b28076b in clTaskPoolEntry (pArg=) at clTaskPool.c:282

13 0x00007f0879b3691f in cosPosixTaskWrapper (pArgument=) at posix/clCommonCos.c:953

14 0x00007f0877d46f8e in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0

15 0x00007f0877868a0d in clone () from /lib/x86_64-linux-gnu/libc.so.6

hungta commented 10 years ago

The root cause is at the for decrement loop in ckptSvrHdlDeleteCallback(): for(i = numMutex-1; i >= 0; --i) if(secMutex[i] != CL_HANDLE_INVALID_VALUE) clOsalMutexUnlock(secMutex[i]);

where i declared as unsigned integer: ClUint32T i; which causes the core dump in the above for loop.