OpenClovis / SAFplus-Availability-Scalability-Platform

Middleware that provides libraries, GUI, and code generator to design multi-node (clustered) applications that are highly available, redundant, and scalable. Provides sub-second node and application fault detection and failover, and useful application libraries including distributed hash tables (checkpoint), event, logging, and communications. Implements SA-Forum APIs where applicable. Used anywhere reliability is a must -- like telecom, wireless, defense and enterprise computing. Download stable release with installer from: ftp.openclovis.com
www.openclovis.com
GNU General Public License v2.0
20 stars 13 forks source link

Msg server core dump because of queue group shared memory screw up #53

Closed karthick18 closed 11 years ago

karthick18 commented 11 years ago

Here is the backtrace reported from customer.

Also , we observed core dump in SYS_CTRL with the below backtrace.

(gdb) bt

0 0x00007f7763f38425 in __GI_raise (sig=) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64

1 0x00007f7763f3bb8b in __GI_abort () at abort.c:91

2 0x00007f7763f310ee in __assert_fail_base (fmt=, assertion=0x7f77658c262d "(nameIn->length < 256)", file=0x7f77658c2617 "clCommon.c", line=, function=)

at assert.c:94

3 0x00007f7763f31192 in __GI___assert_fail (assertion=assertion@entry=0x7f77658c262d "(nameIn->length < 256)", file=file@entry=0x7f77658c2617 "clCommon.c", line=line@entry=67,

function=function@entry=0x7f77658c26d2 "clNameCopy") at assert.c:103

4 0x00007f7765836b46 in clNameCopy (nameOut=nameOut@entry=0x7f7766575340, nameIn=nameIn@entry=0x7f7747f362a0) at clCommon.c:67

5 0x00007f77662701e7 in clMsgQGroupCkptDataUnmarshal (qCkptData=qCkptData@entry=0x7f7766575340, inData=inData@entry=0x7f7747f362a0)

at /home/sayan/Documents/dev/thirdparty/openclovis-6.0/_build/sdk-6.0/src/ASP/components/msg/common/clMsgCkptData.c:161

6 0x00007f7766266a0a in clMsgQueueGroupsRemove (pQName=pQName@entry=0x7f7762982be8) at clMsgCkptServer.c:268

7 0x00007f7766267fe5 in clMsgQCkptCompDown (pAddr=pAddr@entry=0x7f77665757a0) at clMsgCkptServer.c:666

8 0x00007f776626b9ef in clMsgCompLeftCleanup (pAddr=pAddr@entry=0x7f77665757a0) at clMsgGeneral.c:245

9 0x00007f776626a370 in clMsgNotificationReceiveCallback (pAddr=0x7f77665757a0, event=CL_IOC_COMP_DEATH_NOTIFICATION, pArg=) at clMsgEo.c:196

10 clMsgNotificationReceiveCallback (event=CL_IOC_COMP_DEATH_NOTIFICATION, pArg=, pAddr=0x7f77665757a0) at clMsgEo.c:174

11 0x00007f77656f2c6e in clEoClientNotification (notification=0x498a, notification@entry=0x7f7766575930) at clEoLibs.c:515

12 0x00007f77656f39ea in clEoProcessIocRecvPortNotification (pThis=0x2512818, eoRecvMsg=0x0, priority=, protoType=, length=, srcAddr=...) at clEoLibs.c:259

13 0x00007f77656f670c in clEoJobHandler (pJob=pJob@entry=0x7f77540039f8) at eo.c:3861

14 0x00007f7765855392 in clTaskPoolEntry (pArg=) at clTaskPool.c:277

15 0x00007f77657d71e5 in cosPosixTaskWrapper (pArgument=) at posix/clCommonCos.c:951

16 0x00007f7765351e9a in start_thread (arg=0x7f7766576700) at pthread_create.c:308

17 0x00007f7763ff5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

18 0x0000000000000000 in ?? ()

The shared segment is messed up for queue groups and hence triggers a core while trying to delete a queue entry from the queue group before syncing it back to shared segment.