basho / riak-java-client

The Riak client for Java.
Apache License 2.0
267 stars 158 forks source link

shutdown() causes four second delays [JIRA: CLIENTS-1050] #689

Closed TJC closed 8 years ago

TJC commented 8 years ago

Summary

After calling client.shutdown(), there is always a delay of 4 seconds before the app actually exits.

Why is this, and what can I do about it?

Expected Behavior

During a clean shutdown, an app should exit nearly immediately.

Actual Behavior

There is always a four second delay.

Steps to Reproduce

To demonstrate the issue, use these files: https://gist.github.com/TJC/9a6a174cb1419a7c32e8018c5a495e3d

If you put both of them in a fresh directory and then run "sbt", it should grab various dependencies and stuff, and then you can use "compile" and "run" commands. (You'll need to do "export RIAK_SERVER=my.riak.cluster.net" in the shell before you run sbt)

If you do "run" a few times, you'll see it always takes four seconds to get back to the sbt prompt. If you comment out the two riak statements in the source code (the connection and shutdown), then "run" a few times, it takes zero seconds.

I've tested this outside of sbt and the same issue exists.. it's just easier to make a quick demo that works inside sbt.

Context

When writing small helper scripts that only do brief operations, this causes them to take a long time instead of being instantaneous.

Your Environment

lukebakken commented 8 years ago

Thanks for the report @TJC

@alexmoore - is there debug logging we can enable to see what's happening?

alexmoore commented 8 years ago

@TJC A cursory investigation reveals that this might be some sort of Scala / SBT related issue.

I added SLF4J's simple logger and reran the test:

5:10:25:riak-scala $ sbt run
[info] Set current project to riak-slow (in build file:/Users/alex/src/other/riak-scala/)
[info] Running riak.bug.RiakDemo
starting up
2016-11-28 15:10:34:416 -0500 [run-main-0] WARN com.basho.riak.client.core.RiakNode - The Java Security "networkaddress.cache.ttl" property may be set to cache DNS lookups forever. Using domain names for Riak nodes or an intermediate load balancer could result in stale IP addresses being used for new connections, causing connection errors. If you use domain names for Riak nodes, please set this property to a value greater than zero.
2016-11-28 15:10:34:425 -0500 [run-main-0] DEBUG io.netty.util.internal.logging.InternalLoggerFactory - Using SLF4J as the default logging framework

/**  Netty startup messages omitted **/

2016-11-28 15:10:34:616 -0500 [run-main-0] INFO com.basho.riak.client.core.RiakNode - RiakNode started; 127.0.0.1:10017
2016-11-28 15:10:34:617 -0500 [run-main-0] INFO com.basho.riak.client.core.RiakCluster - RiakCluster is starting.
2016-11-28 15:10:34:617 -0500 [run-main-0] INFO com.basho.riak.client.core.RiakCluster - RiakCluster is shutting down.
2016-11-28 15:10:34:617 -0500 [run-main-0] INFO com.basho.riak.client.core.RiakCluster - Creating Shutdown future
2016-11-28 15:10:35:121 -0500 [pool-3-thread-2] INFO com.basho.riak.client.core.RiakCluster - All operations have completed
2016-11-28 15:10:35:121 -0500 [pool-3-thread-1] INFO com.basho.riak.client.core.RiakCluster - Retrier shutting down.
2016-11-28 15:10:35:121 -0500 [pool-3-thread-2] DEBUG com.basho.riak.client.core.RiakCluster - calling shutdown on node 127.0.0.1:10017
2016-11-28 15:10:35:121 -0500 [pool-3-thread-2] INFO com.basho.riak.client.core.RiakNode - RiakNode shutting down; 127.0.0.1:10017
2016-11-28 15:10:35:122 -0500 [pool-3-thread-2] INFO com.basho.riak.client.core.DefaultNodeManager - NodeManager removed node due to it shutting down; 127.0.0.1:10017
2016-11-28 15:10:35:123 -0500 [pool-3-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - Node state changed to shutdown; 127.0.0.1:10017
2016-11-28 15:10:35:124 -0500 [pool-3-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - Active nodes remaining: 0
2016-11-28 15:10:35:126 -0500 [pool-3-thread-1] DEBUG com.basho.riak.client.core.RiakCluster - RiakCluster shut down bootstrap
2016-11-28 15:10:35:126 -0500 [pool-3-thread-1] INFO com.basho.riak.client.core.RiakCluster - RiakCluster has shut down
2016-11-28 15:10:35:126 -0500 [run-main-0] INFO com.basho.riak.client.core.RiakCluster - Shutdown Await done!
done, exiting2016-11-28 15:10:35:126 -0500 [pool-3-thread-1] DEBUG com.basho.riak.client.core.RiakNode - RiakNode shut down 127.0.0.1:10017

Exception: sbt.TrapExitSecurityException thrown from the UncaughtExceptionHandler in thread "run-main-0"
2016-11-28 15:10:35:133 -0500 [nioEventLoopGroup-2-9] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxCapacityPerThread: 32768
2016-11-28 15:10:35:133 -0500 [nioEventLoopGroup-2-9] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.maxSharedCapacityFactor: 2
2016-11-28 15:10:35:133 -0500 [nioEventLoopGroup-2-9] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.linkCapacity: 16
2016-11-28 15:10:35:133 -0500 [nioEventLoopGroup-2-9] DEBUG io.netty.util.Recycler - -Dio.netty.recycler.ratio: 8
[success] Total time: 4 s, completed Nov 28, 2016 3:10:38 PM

Riak reports that it's resources are shut down at 15:10:35:126, and Netty reports it's Recycler is done at 15:10:35:133, but SBT reports that it's done at 3:10:38, a full 3 seconds after we've completed shutdown.

If you fork the JVM process for sbt by adding fork := true to your built.sbt file, the test will complete in a second or so.

I'm wondering if the SBT class loader is being slow for single processes, or if it's waiting for something that Netty's doing (and not logging) that it doesn't need to wait for.

A workaround would be to fork the process as described above, or get a PHD in how SBT works :)

TJC commented 8 years ago

Hi Alex, Thanks for investigating this.

In my OP I swore I'd tested this outside of sbt, and was seeing the same problem, but now I've gone back, the external test is working OK. Damn.. I must have screwed something up when doing that.

If it's only present when inside sbt then of course I'm happy to close this issue as a peculiarity that doesn't apply to production use-cases.

Basho-JIRA commented 7 years ago

Fixed, or closed via GitHub issues.

[posted via JIRA by Alexander Moore]