Relationship between idleTimeout and maxLifetime wrt open connections.

godenji commented 8 years ago

Hi, thanks for Hikari, quite impressive on the performance front, very nice ;-)

Question about interplay between idleTimeout and maxLifetime; with the following I'm seeing 116 connections open (with max used at 9) in MySQL:

minimumIdle="5"
maximumPoolSize="10"
idleTimeout="0" # does this negate maxLifetime setting?
maxLifetime="27000000" # 7.5 hours (mysql timeout is 8 hours)

In local dev I noticed that there were 6546 connections open, heh O_o This may have to do with a bug in Play Framework where stop hooks are not getting called on application reload.

Just trying to narrow down what may be causing the large number of open connections, suspect it's that the idleTimeout setting is causing maxLifetime to be ignored.

godenji commented 8 years ago

Re-opening as the suggested MySQL configuration of maxLifetime 7.5 hours (i.e. slightly less than default server connection timeout of 8 hours) combined with caching prepared statements server-side (via cachePrepStmts) seems to prevent underlying prepared statements from closing in a timely manner.

Presumably this is due to statements being cached on a per connection basis -- on a heavily loaded server when connections have a long TTL it's quite possible to reach max prepared statement limit (@16K). Not sure what Hikari should do here but even on a moderately loaded server open statement count will quickly reach > 2K, this despite setting prepStmtCacheSize="500".

By leaving maxLifetime at default 30 minutes I see that the open statements are cleaned up in short order. Perhaps this is a non-issue/expected behavior, but relying on a stop-the-world GC event to hopefully clean up open statements before max statement limit is reached seems like risky business ;-)

Feedback appreciated.

p.s. run this query to view open statements

SELECT sp.thread_id, t.processlist_user user, t.processlist_host host, sp.count_star - sd.count_star open_com_query_ps, cp.count_star - cc.count_star open_com_prepare_ps FROM ( SELECT COUNT_STAR, THREAD_ID FROM events_statements_summary_by_thread_by_event_name WHERE event_name = 'statement/sql/prepare_sql' ) sp JOIN ( SELECT COUNT_STAR, THREAD_ID FROM events_statements_summary_by_thread_by_event_name WHERE event_name = 'statement/com/Prepare' ) cp ON (cp.THREAD_ID = sp.THREAD_ID) JOIN ( SELECT COUNT_STAR, THREAD_ID FROM events_statements_summary_by_thread_by_event_name WHERE event_name = 'statement/sql/dealloc_sql' ) sd ON (sd.THREAD_ID = sp.THREAD_ID) JOIN ( SELECT COUNT_STAR, THREAD_ID FROM events_statements_summary_by_thread_by_event_name WHERE event_name = 'statement/com/Close stmt' ) cc ON (cc.THREAD_ID = sp.THREAD_ID) JOIN threads t ON (t.thread_id = sp.thread_id) ORDER BY GREATEST(open_com_query_ps, open_com_prepare_ps) DESC limit

brettwooldridge commented 7 years ago

@godenji Sorry for the delayed response, completely swamped at work...

It is important to understand that cachePrepStmts is not the same as useServerPrepStmts. I am assuming that you have useServerPrepStmts enabled as well?

Quoting from this answer:

... if the statement can be a server-side prepared statement (and they are enabled), those are cached per-connection, with the statement string as the key, and the actual instance of the class that is responsible for carrying out communications with the server to execute the server-side prepared statement.

If the SQL is only suitable for client-side ... then it is stored per-connection, with the statement string as the key, and the parsed form of the prepared statement as the cached value. Further instances of prepared statements for the same SQL skip the parsing/tokenizing stage and share the parsed form.

It is unclear to me how you can accumulate unclosed PreparedStatements.

You obtain a connection from HikariCP (getConnection())
You prepare a statement (prepareStatement())
You set values and execute it some number of times
You either close the statement or you close the connection

Step 1, understood...

Step 2, when you call Connection.prepareStatement() you are actually calling the HikariCP ProxyConnection.prepareStatement() method. This is going to prepare the statement by calling MySQL, wrap it in a ProxyPreparedStatement, and will track the statement.

Step 3, understood...

Step 4. You either close the PreparedStatement (hopefully in a finally block), or you close the Connection (to return it to the pool).

If you close the PreparedStatement, HikariCP will "untrack" it and then call the real MySQL close() method. The MySQL close() method ends up here. If you trace that method, you will see that it closes the server-side statement and then de-caches the client-side statement. So that seems correct...

If you do not close the PreparedStatement, but instead just close the Connection then you will end up back in the ProxyConnection. The first thing the close() method does is call the closeStatements() method, which is going to iterate over the tracked statement list ... a try-with-resources ensures that each one is closed. The close() in this case again ends up in the MySQL driver method referenced above.

In summary, unless you have a Connection that is taken from the pool and never returned, I can't see how you could possibly leak server-side statements, unless it is a MySQL bug.

If at all possible, I would try to run everything under a debugger, and break point the HikariCP methods above, as well as the MySQL methods to see if you can spot how the close message to the MySQL server might somehow be bypassed.

godenji commented 7 years ago

@brettwooldridge thanks.

Yes, using both cachePrepStmts and useServerPrepStmts. Running sample query in my last reply via mysql cli shows, for example, 1097 open prepared statements on a single thread (with fewer open PSs on other available threads).

Using a Slick-like query DSL (Slick is actually derived from it); here are some entry points for a basic audit trail.

1) connection is obtained here

2) statement is prepared here

4) at minimum the connection is closed here on every query (application is only able to run queries via withSession or withTransaction blocks). Statements on the other hand may or may not be closed depending result set conditions.

Debugger's a good call. Imported a heap dump into Eclipse MAT -- memory usage looks to be pretty high for jdbc connection references:

"com.mysql.jdbc.NonRegisteringDriver$ConnectionPhantomReference" loaded by "sun.misc.Launcher$AppClassLoader @ 0xf0000208" occupies 16,370,464 (21.75%) bytes. The memory is accumulated in one instance of "com.mysql.jdbc.ConnectionImpl$3" loaded by "sun.misc.Launcher$AppClassLoader @ 0xf0000208".

One instance of "com.mysql.jdbc.JDBC4Connection" loaded by "sun.misc.Launcher$AppClassLoader @ 0xf0000208" occupies 10,052,944 (13.35%) bytes. The memory is accumulated in one instance of "com.mysql.jdbc.ConnectionImpl$3" loaded by "sun.misc.Launcher$AppClassLoader @ 0xf0000208".

So, IIUC memory usage should be quite low provided connections and/or statements are closed and underlying jdbc driver is bug-free. If that's the case could be a bug on my end -- debugger session should shed more light on what's happening....

brettwooldridge commented 7 years ago

@godenji Do you think you can create a simple test in slaq that appears to leak a prepared statement? If so, I can take a shot at debugging it down through HikariCP, down into Connector/J, and back...

godenji commented 7 years ago

@brettwooldridge thanks, might not be a leak (just compared 2 heap dumps 24 hours apart and the latest dump indicates retained heap has shrunk by 6MB, 2MB of which is MySQL/Hikari related), or if there is one, it's growing slowly.

Not sure if this is due to the timing of the dumps though (e.g. dump 1 occurred some time before major GC event and/or 7.5 hour idleTimeout setting, while dump 2 occurred say, shortly after idleTimeout in which case memory would be freed up related to open connections). If caching prepared statements entails minor memory overhead I'm fine with that, provided it's not leaking memory that is ;-)

Unfortunately in the midst of apartment hunting hell so likely won't have time in the short-term to put together a sample Play app to debug against o_O

brettwooldridge commented 7 years ago

@godenji Please reopen if you confirm the leak exists and is caused by HikariCP. From what I can tell, unless you are leaking connections, HikariCP is going to make it very hard to leak Statements. Unless of course there is a leak in the DB driver.

godenji commented 7 years ago

@brettwooldridge sure, guess can close as the leak is not directly caused by Hikari, though it and ConnectorJ do play a role.

Here's what I've found out so far wrt to idleTimeout, maxLifetime, statement caching, and a potential leak:

According to the JDBC spec when closing a statement the corresponding result set is automatically closed. This promise seems to be broken, however, when enabling server-side statement caching. In Slaq and upstream in Slick, automatic closing of the result set is expected (as only statement.close() is called).

Stepping through a debug session of my app shows that both Hikari and ConnectorJ handle statement caching tracking and related add/remove operations; now, given that the statement is cached, ConnectorJ's realClose() method is never called, which means the statement effectively has the same lifetime of the underlying connection (nearly 8 hours with current config) -- result sets, not closed, appear to live until the connection is closed.

Heap dumps of the production app showed large instances of JDBC42ResultSet. I have patched Slaq to close both the statement and result set. Will monitor and see how this change affects overall memory usage.

Also added a shutdown hook for AbandonedConnectionCleanupThread as this is known to cause a memory leak on redeploy in some containers.

We'll see, has been an interesting time diving into jdbc internals, hikari, and stepping through code path of my application -- scary ;-)

godenji commented 7 years ago

@brettwooldridge manually closing result set did the trick -- memory usage now stable. Overhead of prepared statement cache and in-memory cache stay within a range instead of steadily climbing toward OOME.

brettwooldridge / HikariCP

Relationship between idleTimeout and maxLifetime wrt open connections. #706