brettwooldridge / HikariCP

光 HikariCP・A solid, high-performance, JDBC connection pool at last.
Apache License 2.0
19.91k stars 2.92k forks source link

Why is the minimum keep alive 30 seconds? #2112

Open Randgalt opened 1 year ago

Randgalt commented 1 year ago

Why is the minimum keep alive 30 seconds?

This seems like an arbitrary number and quite high. Why not allow smaller times?

lfbayer commented 1 year ago

I would ask the opposite question, do you have a use-case where you would need a shorter time than that?

The only reason you would need a keep alive under 30 seconds would be if your database server closes connections automatically that have been quiet for 30 seconds, and this seems like an unlikely situation.

Randgalt commented 1 year ago

Yes, we would like to have a 15 second keep alive. Yes, our DB is currently configured to close connections every 15 seconds in some circumstances.

lfbayer commented 1 year ago

But why would you have such a configuration. What problem does that solve?

This context would be helpful to be able to understand what actually needs exist. It seems like if your database closes connections that quickly you wouldn’t really want to use a keepalive in the first place. Would it make more sense to increase the idle timeout in the database for this known application rather than require the application to send otherwise unnecessary traffic just to keep the connection open.

If the short timeout is to avoid leaked connections, surely you don’t have that problem with the application running HikariCP.

On Tue, Sep 19, 2023 at 8:20 PM Jordan Zimmerman @.***> wrote:

Yes, we would like to have a 15 second keep alive. Yes, our DB is currently configured to close connections every 15 seconds in some circumstances.

— Reply to this email directly, view it on GitHub https://github.com/brettwooldridge/HikariCP/issues/2112#issuecomment-1725308968, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTCJFJPPNFDADOWCHQCB3X3F5YXANCNFSM6AAAAAA46DAHKE . You are receiving this because you commented.Message ID: @.***>

Randgalt commented 1 year ago

I promise you we have this situation. I'm not sure what benefit it would be to explain our entire application. It's a simple request: please lower the minimum allowed keep alive. It's one line of code for you and really helps us out. If you can't do that, that's too bad.

This context would be helpful to be able to understand what actually needs exist

We've spent weeks with our DB supplier working this out. Honestly, I'm not up for re-litigating this. I hope you can trust that we have good reason for the request.

lfbayer commented 1 year ago

I apologize if my tone sounded argumentative. That was not intentional. I just feel a responsibility to understand the underlying use cases before making changes that could lead to problems for other users.

I genuinely want to understand what the reasons are for having such a short idle timeout while still not wanting to close the connections. This knowledge can help me understand if there are other potential changes that would be reasonable as well as to be able to understand the risks more.

It would also be helpful to have a pull request with the change and relevant unit tests so that we can feel more comfortable that we aren’t breaking anything.

On Tue, Sep 19, 2023 at 9:55 PM Jordan Zimmerman @.***> wrote:

I promise you we have this situation. I'm not sure what benefit it would be to explain our entire application. It's a simple request: please lower the minimum allowed keep alive. It's one line of code for you and really helps us out. If you can't do that, that's too bad.

This context would be helpful to be able to understand what actually needs exist

We've spent weeks with our DB supplier working this out. Honestly, I'm not up for re-litigating this. I hope you can trust that we have good reason for the request.

— Reply to this email directly, view it on GitHub https://github.com/brettwooldridge/HikariCP/issues/2112#issuecomment-1725460322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTCJHDKPHLDDSDCFUCYJLX3GI3VANCNFSM6AAAAAA46DAHKE . You are receiving this because you commented.Message ID: @.***>

lfbayer commented 1 year ago

And to answer your first question, I don’t know why that minimum is there. It might be intentional or it might have been arbitrary. I would have to investigate further to know.

On Tue, Sep 19, 2023 at 10:10 PM Leo Bayer @.***> wrote:

I apologize if my tone sounded argumentative. That was not intentional. I just feel a responsibility to understand the underlying use cases before making changes that could lead to problems for other users.

I genuinely want to understand what the reasons are for having such a short idle timeout while still not wanting to close the connections. This knowledge can help me understand if there are other potential changes that would be reasonable as well as to be able to understand the risks more.

It would also be helpful to have a pull request with the change and relevant unit tests so that we can feel more comfortable that we aren’t breaking anything.

On Tue, Sep 19, 2023 at 9:55 PM Jordan Zimmerman @.***> wrote:

I promise you we have this situation. I'm not sure what benefit it would be to explain our entire application. It's a simple request: please lower the minimum allowed keep alive. It's one line of code for you and really helps us out. If you can't do that, that's too bad.

This context would be helpful to be able to understand what actually needs exist

We've spent weeks with our DB supplier working this out. Honestly, I'm not up for re-litigating this. I hope you can trust that we have good reason for the request.

— Reply to this email directly, view it on GitHub https://github.com/brettwooldridge/HikariCP/issues/2112#issuecomment-1725460322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHTCJHDKPHLDDSDCFUCYJLX3GI3VANCNFSM6AAAAAA46DAHKE . You are receiving this because you commented.Message ID: @.***>

lfbayer commented 1 year ago

I'm looking at the code now and I don't actually see any enforcement of a minimum keepalive time. In fact the unit test for keepalive time actually uses 500ms as the test value. Where are you seeing a minimum of 30 seconds?

I do see that the housekeeping thread only runs once every 30 seconds, but that code doesn't do anything with keepalive.

lfbayer commented 1 year ago

After investigating even more and actually trying to set the value to something less than 30 seconds I found where the check is. I also looked over the original pull request, and have come to the determination that I still do not know why there is a minimum and nothing in the original pull request seems to shed any light on it.

I'm not completely against lowering that minimum check, but I would like to know the specifics of the situation that actually requires such a short idle timeout.

Randgalt commented 1 year ago

I apologize if my tone sounded argumentative.

No worries - thank you for your time

Randgalt commented 1 year ago

I'm not completely against lowering that minimum check, but I would like to know the specifics of the situation that actually requires such a short idle timeout.

We are going to change our timeout to 30s. But, for the future I'll try to put together a PR with a proposed change and an explanation of what we're trying to accomplish.

Thank you.

mathieufortin01 commented 1 year ago

Except if there is a reason for it, I think the lib should just let the user sets the value it wants. Its pretty sneaky to disable keepalive when below 30s. A use case: fast detection of writer becoming a reader, without relying on max lifetime. (ok it relies on the fact that the keep alive task uses the validation query AND evicts the connection on failure, so maybe thats also a little sneaky on my part).

gary258796 commented 10 months ago

How about giving a warning log when keepaliveTime is set below 30 or a number ??

lfbayer commented 10 months ago

I haven't really investigated this yet, but I'm concerned about how the scheduling of the housekeeper thread works. It seems plausible that even with a setting below 30, if the housekeeper doesn't run it won't matter. So if the housekeeper is running in increments of 30 seconds, then a setting less than 30 seconds wouldn't be honored anyway.

But again, I haven't really investigated this. It's just a feeling.

gary258796 commented 10 months ago

I haven't really investigated this yet, but I'm concerned about how the scheduling of the housekeeper thread works. It seems plausible that even with a setting below 30, if the housekeeper doesn't run it won't matter. So if the housekeeper is running in increments of 30 seconds, then a setting less than 30 seconds wouldn't be honored anyway.

But again, I haven't really investigated this. It's just a feeling.

I check what HouseKeeper thread handle., but don't see anything in HouseKeeper related with keepAlive.

lfbayer commented 10 months ago

As I said, it was just a suspicion. The real question is probably if we allow the setting, does it actually cause the keep alive to run at the expected time?

gary258796 commented 10 months ago

I create a simple SpringBoot project on my local machine, which have configuration in applicaiton.properties like below

spring.datasource.hikari.keepalive-time=20000
spring.datasource.hikari.minimum-idle=1
spring.datasource.hikari.maximum-pool-size=5

and -Dcom.zaxxer.hikari.housekeeping.periodMs=40000 set in system properties.

Screenshot 2023-11-20 at 8 14 42 PM

From the console log above , we can see the keepAlive do run every 18 seconds and houseKeeper do run every 38 seconds, which is correspond to configuration. So i guess, no matter of what keepAliveTime and HouseKeeper are set to, keepAlive task will still in the way we configure.

ruzovvo commented 6 months ago

@lfbayer , I see you put "volunteer-needed" label. Does it mean you came to conclusion that 30s minimum for keep-alive doesn't have enough arguments to exist and you just need someone to open PR?

I see a number of other properties with minimum values which also are a subject to removal (imho): 30s for max-lifetime, 2s for leak detection, 250ms for connection timeout, 250ms for validation timeout.

250ms for connection timeout and validation also seems to have lack of argumentation as my p99 response timing is 10ms. Probably i'd prefer having 250ms timeout for queries being run in keep-alive thread, but not in application logic thread. As in case of 10ms timeout I have enough time to use another connection (which may be not broken)