RestComm / jain-slee

The World's #1 Open Source JAIN-SLEE (JSLEE) 1.1 Implementation
http://www.restcomm.com/
GNU Affero General Public License v3.0
24 stars 50 forks source link

NullPointerException when getActivityFlags() #103

Closed vetss closed 6 years ago

vetss commented 7 years ago

I faced with an Exception at SMSC GW:

2017-04-10 05:53:18,164 ERROR [org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl] (pool-27-thread-1) Failed to load AC. java.lang.NullPointerException at org.mobicents.slee.runtime.activity.ActivityContextImpl.getActivityFlags(ActivityContextImpl.java:166) at org.mobicents.slee.runtime.activity.ActivityContextImpl.(ActivityContextImpl.java:133) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:186) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:178) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.queryLiveness(ActivityManagementMBeanImpl.java:497) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.run(ActivityManagementMBeanImpl.java:521) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

We need to understand a reason and provide a workaroud for SLEE does not throw an Exception and process ot more accurate.

duyanh030 commented 7 years ago

Hi @vetss . I think the Exception is generated at https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L163.

Currently I am don't know why cacheData gets null here. It would be helpful if you could provide more log for further investigation.:) if you agree with this, We could check "if cacheData is null then return null before invoking getObject fucntion" as a workaround solution for this issue. If it is ok. I could create a workaround fix for it. The root cause should be investigated later

Regards Anh Dao

nhanth87 commented 7 years ago

hi @vetss do you have any idea why this problem happen, can you please attach server.log for investigating?

duyanh030 commented 7 years ago

Hi . I would like to update status on this issue. Please ignore my previous comments. I used incorrect version of ActivityContextImpl.java then it took me go with wrong analysis.

The exception must be generated by https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L167. Not line 163 as I mentioned before. I did revert to commit "d42bdd6" to make line number as the same as called stack put from @vetss and realize that I am wrong.

156 public int getActivityFlags() { 157 if (flags == null) { 158 // instance has no flags stored, check local ac 159 if (localActivityContext != null) { 160 flags = localActivityContext.getActivityFlags(); 161 } else { 162 // local ac does not exists, get from cache 163 flags = (Integer) cacheData 164 .getObject(NODE_MAP_KEY_ACTIVITY_FLAGS); 165 } 166 } 167 return flags; 168 }

Some conclusions:

  1. "flags" is defined as Integer type and not a "primitive java types". and getActivityFlags method return "int" type. So whenever flags is null and reach line "return flags" NullException is generated.
  2. "flags" is null at Line 167 and must be assigned from https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L163. There is no change for flags is assigned NULL by "flags = localActivityContext.getActivityFlags();" because "localActivityContext.getActivityFlags();" return "int" and will be converted to Interger type.
  3. The code flow try to getActivityFlags from DataCache at Node "ac/{ActivityContextHandle}". 100% this Node is existing in MobicentsCache because code is already checked at https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextFactoryImpl.java#L190 . But the key "NODE_MAP_KEY_ACTIVITY_FLAGS" is not put in this Node.
  4. I checked all source code and make sure that above Node is created and put key "NODE_MAP_KEY_ACTIVITY_FLAGS" at https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L107. Whenever Node is created then they put "NODE_MAP_KEY_ACTIVITY_FLAGS" in this Node immediately.
  5. The NullException is generated because we are try to get "NODE_MAP_KEY_ACTIVITY_FLAGS" from DataCache at Node "ac/{ActivityContextHandle}". But the key "NODE_MAP_KEY_ACTIVITY_FLAGS" is not there. I still can't make the conclusion here. It could be: ++ we try to put key at https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L108 failed?. ++ Jboss DataCache synchronization is failed? I see all restcomm products could be setup and run on cluster but still not try it. ++ I don't see any line code try to unset key "NODE_MAP_KEY_ACTIVITY_FLAGS". They just remove the Node completely only. It should not cause this exception.

Next I will try setup cluster to see if I can find more thing on this issue.

duyanh030 commented 7 years ago

Status update:.

I am successful to preproduce this problem by using google-talk-bot example, Jconsole, and add some sleep (delay) function in slee and xmpp RA

File modified:

xmpp.patch file I added some delay to avoid xmpp send some init handshake message. Then I have 20 seconds to trigger queryActivityContextLiveness from Jconsole core.patch file -> I add some sleep after create activitycontexthandle in Datacache and pause 20 seconds before add NODE_MAP_KEY_ACTIVITY_FLAGS keys in Cache.

Steps Step 1: Deploy google-talk-bot example. Wait until log message "duyanh debug ActivityContextImpl sleeping 20000" Step2: Quickly invoke queryActivityContextLiveness operation from Jconsole. Then this issue will be 100% reproducible

log file nullPointerException.txt.tar.gz

Called Stack

2017-05-06 11:28:21,336 ERROR [org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl] (RMI TCP Connection(2)-127.0.0.1) Failed to load AC.: java.lang.NullPointerException at org.mobicents.slee.runtime.activity.ActivityContextImpl.getActivityFlags(ActivityContextImpl.java:178) at org.mobicents.slee.runtime.activity.ActivityContextImpl.(ActivityContextImpl.java:145) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:192) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:184) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.queryLiveness(ActivityManagementMBeanImpl.java:496) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.run(ActivityManagementMBeanImpl.java:520) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl.queryActivityContextLiveness(ActivityManagementMBeanImpl.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at org.jboss.as.jmx.PluggableMBeanServerImpl$TcclMBeanServer.invoke(PluggableMBeanServerImpl.java:1503) at org.jboss.as.jmx.PluggableMBeanServerImpl.invoke(PluggableMBeanServerImpl.java:724) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

The called stack I preproduced have a little bit difference compared with the Author's one. Because I trigger "queryActivityContextLiveness" manually from jconson while It's called by scheduler from called stack of author.

Conclusion The null pointer is generated by https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L167. When they try to get NODE_MAP_KEY_ACTIVITY_FLAGS from Cache:

flags = (Integer) cacheData.getObject(NODE_MAP_KEY_ACTIVITY_FLAGS);

and null pointer returned.... The reason is that: we have another thread just create New Node in Cache by https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L107 and still not invoke line 108 to add "NODE_MAP_KEY_ACTIVITY_FLAGS key" into Cache Data. This issue is rarely occured. Solution Discussion: I suggest We could:

  1. Assign flags = 0 whenever "(Integer) cacheData.getObject(NODE_MAP_KEY_ACTIVITY_FLAGS)" return null to avoid null exception.
  2. we need to provide some blocking mechanism to prevent the share resource between threads. But I think it should be a big change if we follow the solution 2 Next I will prepare a PR with solution 1 and look forward to hearing some ideas from Expert.
satanatoly commented 6 years ago

Closing. Added return value check