RestComm / jain-slee

The World's #1 Open Source JAIN-SLEE (JSLEE) 1.1 Implementation
http://www.restcomm.com/
GNU Affero General Public License v3.0
24 stars 50 forks source link

Fix issue #103: NullPointerException when getActivityFlags() #112

Closed duyanh030 closed 6 years ago

duyanh030 commented 7 years ago

++ Add synchronized block to prevent resource access conflict

jaimecasero commented 7 years ago

apparently this is trying to synchronize a constructor with getter method. I dont think we really need this kind of synchronization at all, since the object lifecycle is naturally protecting the get access.

Assuming the reported NPE was during getter method (where is the stack trace by the way @duyanh030 ), i can only think about the client code passing null as cacheData in constructor and then causing NPE iwth a subsequent call to "getActivityFlags". If that is the case,the sync section will not prevent the NPE.

Have you tested the code somehow? can you provide a unit test to reproduce the original issue, and show how the patch fixes it?

duyanh030 commented 7 years ago

Hi Jaime. Thanks for your quickly response. I am trying to synchronize an variable named "NODE_MAP_KEY_ACTIVITY_FLAGS". I copy the comment from the issue to here. They contains some information like: how could I preproduce it? I also tested the commit by preproducing the problem to see if my changes could pass the NPE. Please see below for more information.

Below this line are copied from comment.

Status update:.

I am successful to preproduce this problem by using google-talk-bot example, Jconsole, and add some sleep (delay) function in slee and xmpp RA

File modified:

xmpp.patch file I added some delay to avoid xmpp send some init handshake message. Then I have 20 seconds to trigger queryActivityContextLiveness from Jconsole core.patch file -> I add some sleep after create activitycontexthandle in Datacache and pause 20 seconds before add NODE_MAP_KEY_ACTIVITY_FLAGS keys in Cache.

Steps Step 1: Deploy google-talk-bot example. Wait until log message "duyanh debug ActivityContextImpl sleeping 20000" Step2: Quickly invoke queryActivityContextLiveness operation from Jconsole. Then this issue will be 100% reproducible

log file nullPointerException.txt.tar.gz

Called Stack

2017-05-06 11:28:21,336 ERROR [org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl] (RMI TCP Connection(2)-127.0.0.1) Failed to load AC.: java.lang.NullPointerException at org.mobicents.slee.runtime.activity.ActivityContextImpl.getActivityFlags(ActivityContextImpl.java:178) at org.mobicents.slee.runtime.activity.ActivityContextImpl.(ActivityContextImpl.java:145) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:192) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:184) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.queryLiveness(ActivityManagementMBeanImpl.java:496) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.run(ActivityManagementMBeanImpl.java:520) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl.queryActivityContextLiveness(ActivityManagementMBeanImpl.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71) at sun.reflect.GeneratedMethodAccessor15.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112) at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801) at org.jboss.as.jmx.PluggableMBeanServerImpl$TcclMBeanServer.invoke(PluggableMBeanServerImpl.java:1503) at org.jboss.as.jmx.PluggableMBeanServerImpl.invoke(PluggableMBeanServerImpl.java:724) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1468) at javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1309) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1401) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:829) at sun.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:346) at sun.rmi.transport.Transport$1.run(Transport.java:200) at sun.rmi.transport.Transport$1.run(Transport.java:197) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:196) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:683) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

The called stack I preproduced have a little bit difference compared with the Author's one. Because I trigger "queryActivityContextLiveness" manually from jconson while It's called by scheduler from called stack of author.

Conclusion The null pointer is generated by https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L167. When they try to get NODE_MAP_KEY_ACTIVITY_FLAGS from Cache:

flags = (Integer) cacheData.getObject(NODE_MAP_KEY_ACTIVITY_FLAGS);

and null pointer returned.... The reason is that: we have another thread just create New Node in Cache by https://github.com/RestComm/jain-slee/blob/master/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L107 and still not invoke line 108 to add "NODE_MAP_KEY_ACTIVITY_FLAGS key" into Cache Data. This issue is rarely occured. Solution Discussion: we need to provide some blocking mechanism to prevent the share resource between threads.

jaimecasero commented 7 years ago

Ummm how can thread2 invoke ActivityContextImpl.getActivityFlags(), when thread1 is still running the very same object constructor?? How does thread2 come to know the reference to an object that is still not constructed?. can you point to client code that shares this reference?

Even if possible, how is the sync section ensuring the constructor is run before invoking the getter (so the value is properly placed in the map before any attempt to call getActivityFlags)? if thread2 takes the monitor before thread1, it will still call getActivityFlags() when the map is still empty (value not present)...

Can you attach stack trace again?, i dont see where the NPE is caused..., returning a null value from the get method is not necessarily bad, its a problem of the client code invoking the method...

duyanh030 commented 7 years ago

@jaimecasero

Ummm how can thread2 invoke ActivityContextImpl.getActivityFlags(), when thread1 is still running the very same object constructor?? How does thread2 come to know the reference to an object that is still not constructed?. can you point to client code that shares this reference? ->> They are two separated objects.

Thread 1: The scenario is deploying xmpp RA. and trying to create activityContext. Then put it in cache. SleeEndpointImpl.java:289 _startActivity ActivityContextFactoryImpl.java:152 createActivityContext ActivityContextImpl.java:99 ActivityContextImpl ->>> Invoke the constructor to create new AC. Also put them to DataCache.

Thread 2: The scenario is invoking "queryActivityContextLiveness" operation from Jconsole. ActivityManagementMBeanImpl.java:157 queryActivityContextLiveness ActivityManagementMBeanImpl.java:517 run ActivityManagementMBeanImpl.java:496 getActivityContext ActivityContextFactoryImpl.java:182 getActivityContext ActivityContextFactoryImpl.java:191 activityContextCacheData -->> try to create newly ActivityContext Object. It's difference with thread 1. ActivityContextImpl.java:156 getActivityFlags ->> NPE at line 167. "return flags;"

Even if possible, how is the sync section ensuring the constructor is run before invoking the getter (so the value is properly placed in the map before any attempt to call getActivityFlags)? ->> The synchronized block could ensure that whenever creating a Node in DataCache . the key "NODE_MAP_KEY_ACTIVITY_FLAGS" must be there. Then we never get null returned at "flags = (Integer) cacheData.getObject(NODE_MAP_KEY_ACTIVITY_FLAGS);". I think It is an expected behavior?

if thread2 takes the monitor before thread1, it will still call getActivityFlags() when the map is still empty (value not present)...

public ActivityContextImpl getActivityContext(ActivityContextHandle ach, boolean updateLastAccessTime) { ActivityContextCacheData activityContextCacheData = new ActivityContextCacheData(ach, sleeContainer.getCluster()); if (activityContextCacheData.exists()) { try { return new ActivityContextImpl(ach,activityContextCacheData,tracksIdleTime(ach, updateLastAccessTime),this); } ............................... ................................... } else { return null; } }

-->> In this case, the called stack will never reach getActivityFlags. Because It is checked if activityContextCacheData.exist at ActivityContextFactoryImpl.java:189 before trying to create new ActivityContextImpl object.

duyanh030 commented 7 years ago

Can you attach stack trace again?, i dont see where the NPE is caused..., returning a null value from the get method is not necessarily bad, its a problem of the client code invoking the method... -> 2

017-04-10 05:53:18,164 ERROR [org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl] (pool-27-thread-1) Failed to load AC. java.lang.NullPointerException at org.mobicents.slee.runtime.activity.ActivityContextImpl.getActivityFlags(ActivityContextImpl.java:166) at org.mobicents.slee.runtime.activity.ActivityContextImpl.(ActivityContextImpl.java:133) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:186) at org.mobicents.slee.runtime.activity.ActivityContextFactoryImpl.getActivityContext(ActivityContextFactoryImpl.java:178) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.queryLiveness(ActivityManagementMBeanImpl.java:497) at org.mobicents.slee.container.management.jmx.ActivityManagementMBeanImpl$PeriodicLivelinessScanner.run(ActivityManagementMBeanImpl.java:521) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

@jaimecasero . You need to revert jain-slee core to "d42bdd6" to make the lines number are matched. Currently when comparing to master branch latest version. They have a little difference. With the version "d42bdd6" NPE is generated at "return flags;" line. And I could reproduce the issue with jainslee code only (add some delay only, no client code)

jaimecasero commented 7 years ago

I think the suggested sync section is only covering the scenario where google-talk-bot is deployed, and then we access jconsole. What happens if we access jconsole first? I think the NPE will be still there (potentially)....

Please excuse of being so picky, but honestly i dont see the point of adding the sync section yet...

jaimecasero commented 7 years ago

Now, going back to the previous version were NPE is generated (https://github.com/duyanh030/jain-slee/blob/d42bdd67828877a0ded02273662fc5e1a9d381d6/container/activities/src/main/java/org/mobicents/slee/runtime/activity/ActivityContextImpl.java#L166), I noticed the getActivityFlags method is returning "null" for a primitive return type "int". I think that is the ultimate cause of NPE.

We could arguably change the type to Integer, but probably this will cause an NPE later in the chain at https://github.com/RestComm/jain-slee/blob/master/api/jar/src/main/java/javax/slee/resource/ActivityFlags.java#L97

So, i recommend to return a "safe"/default value when flags have not been initiated yet (actually we have such constant already defined at ActivityFlags.NO_FLAGS). Since the flags checking logic is using bitwise operator (&), I think is safe to return "0" when the flags are still null. Something like this... ` public int getActivityFlags() { ... ... .. return (flags != null) ? flags : ActivityFlags.NO_FLAGS;

} `

@duyanh030 wdyt? i think this will prevent NPE in any case, and preventing to use unncessary sync sections. I would also be great if we can create a unit test for this scenario...

duyanh030 commented 7 years ago

Hi @jaimecasero Thanks for your useful comments :) The solution returning default flags is fine for all usecase. I will implement and push it again together with UnitTest.

Thanks alot.

jaimecasero commented 7 years ago

sure, and please remember to put a comment with the issue link,and a brief explanation

duyanh030 commented 7 years ago

Hi @jaimecasero . I pushed new commit to fix as agreed solution and add 1 UT case. Please help me to review it again.

duyanh030 commented 7 years ago

Sorry @jaimecasero . My mistake. I missed to add test file on the commit. Please see new one.

jaimecasero commented 7 years ago

have you tried to use any Mock library like Mockito to prevent creating fake stubs?

duyanh030 commented 7 years ago

I was thinking about it. But I don't see any Mock function in jain-slee. Then I decided to follow current jain-slee TCs. (I see some fake class) @jaimecasero Do you think that should we use Mockito to create Mock function or we should follow the current way?

jaimecasero commented 7 years ago

I think Mockito lib is well established in Restcomm projects. Even if slee is not using it, there is no reason why we can start using it. Just define a test dependency in the parem pom, and start using it (use the most recent version that works with the existing junit version). Recently i created some junit test using mockito, you may use it as reference https://github.com/jaimecasero/jain-slee.smpp/blob/feat/utest/resources/smpp/ra/src/test/java/org/restcomm/slee/resource/smpp/SenderThreadTest.java

jaimecasero commented 7 years ago

yes, it sounds good, i would certainly keep the interface, it doesnt hurt anybody to have it.

Sorry for so many comments on this simple ticket. I hope you notice the added value of all these changes, and hopefully you learn something new on the way...

duyanh030 commented 7 years ago

No worry. I am very happy to receive all your comments. I learnt a lot from it. :)

duyanh030 commented 7 years ago

@jaimecasero the UT code is absolutely pretty by using Mockito lib :)

satanatoly commented 6 years ago

fix applied, closing

gsaslis commented 6 years ago

@satanatoly since this was closed, can we add a reference here to a commit where the fix was applied ?

That way if anyone else stumbles upon this they'll have another place to continue looking ; )