Open gedl opened 6 years ago
Any reason you are using the mysql connector rather than mariadb? AWS suggests usage of the mariadb connector in their documentation, and the driver also contains extra functionality to handle Aurora clusters more effectively compared with the MySQL connector.
If you make the switch, ensure you activate the functionality in the jdbc url and use the cluster endpoint address:
jdbc:mysql:aurora://cluster.cluster-xxxx.eu-west-1.rds.amazonaws.com/db
Important: you must use the *.amazonaws.com address, if you wrap it in a custom CNAME then it won't work effectively in the mariadb driver. See here
@gedl have you tried the above suggestion? Any update on this issue?
Hey, sorry for the delay.
Inspired by the MariaDb driver we ended up implementing a generic jdbc driver that works on top of any aurora cluster, fully supporting mysql and psql, and presumably with future aurora flavours of other jdbc-accessible RDBSs.
We've also made it open source: https://github.com/DiceTechnology/dice-fairlink
It works via AWS Aurora SDK and therefore does not rely on amazonaws.com
sub-domains.
Should I close this case, or do you want to pursue my point nr 3
?
First of all, awesome. Just awesome. 👏
I love to see open source contributions like this.
Let’s leave this open for the time being. If retirements from the pool are not well distributed enough, I think we need a better algorithm. A deterministic one would also be better than our relying on a pseudo random distribution to avoid extinction events.
Again, really impressed and inspired by your team’s initiative in taking the bull by the horns re: Aurora.
Only noticed your response now.
We were surprised by the lack of solutions for what seems to be a common problem with the usage of such a popular combination (HikariCP + RDS/Aurora) and thought this could be useful.
We've taken so much from opensource and would hate to see people wasting time with these "details" instead of making their products great, so open sourcing it was the only acceptable thing to do.
It has been working well in production since, even though I'd like to see it spreading the connections even better. There are a couple of edge cases related to the arithmetics (connection pool size not divisible by number of replicas, etc), but it's much better than before.
Because this thread is still open, I think it's relevant to note that dice-fairlink versions 1.x.x had a scalability problem, where it would be rate limited by the RDS API should many client applications were deployed in the same AWS account (they would all hit the RDS API and roughly the same time).
Versions 2.x.x have worked around this undocumented limits imposed by AWS.
Environment
Extra info: connection pool size: 10, max idle unset.
Having set
maxLifetime
to 30m (the default 1800000 millis) I would expect the behaviour described in #480 to cause connections to be recycled "out of phase" to avoid mass extinction of the pool. What I am observing check here is that AWS Aurora reader endpoint is dispatching lots of connections to the same read replica (note that in the X-axis the interval of Y-axis upward and downward jumps are "exactly" 30 minutes). This graph represents >40th generation of connections. The result is that aurora, arguably because it somehow caches the number of connections on each replica, assigns a whole pool to one replica, eventually (in our case we have 3) leaving one replica with nearly all the connections of all application servers, and the other 2 replicas almost IDLE for a generation's lifetime.I would expect the changes in #480 to gradually scatter the recycling of the pool, up to a maximum of 18s variance after some generations. Admittedly the x-axis of the graph is not granular enough to tell exactly how apart the connections are reaching the aurora cluster, but it doesn't seem that they are spread in any material way.
I have 3 questions: 1 - has anyone observed this phenomena with this, or similar setup 2 - what is my best logging option on the hikari side to observe the application side lifecycle of each generation of connections? 3 - is there anyway to directly set the amount of variance desired to avoid mass extinction?
Thank you very much.