Open danielbaniel opened 3 weeks ago
@danielbaniel I don't use MySQL, but since you pointed out the exact like of problematic code, I believe that statement should be changed to:
if (topology.size() == 1 && getWriter(topology) == null) {
If we have a topology of 1 and there is no writer, then log that message otherwise connect to that writer.
Hey @ucjonathan, this issue isn't mysql specific and applies to pg too. I filled in the issue incorrectly because I only specified the aurora-mysql plugin in this issue description but it affects both.
In either case however, your fix suggestion seems appropriate. As soon as the driver is connected to a writer it should go ahead and serve requests, no reason to wait for other instances.
I expect it will apply to MAZ clusters too not just Aurora. Whatever the context, as soon as you have a writer there's no need to wait for another instance to be up if you're looking for a writer endpoint.
Describe the bug
The bug lies in the code here: https://github.com/aws/aws-advanced-jdbc-wrapper/blob/d9a563b9613d2c7075c6a1ff4d5b16af0f615324/wrapper/src/main/java/software/amazon/jdbc/plugin/failover/ClusterAwareWriterFailoverHandler.java#L408
I don't know the history of this check, but it's problematic in a few situations.
Take a two instance cluster with instance Foo and instance Bar. Lets say Foo is the writer. Foo crashes and Bar gets promoted to the writer. When Bar becomes available the driver will get stuck in this loop until Foo comes up as a reader (which may never happen in a bounded time depending on other problems) and brings the topology size to two. However, as soon as the driver is connected to Bar it has a writer connection and can complete the failover so all the additional downtime is unnecessary.
Expected Behavior
I expect the driver to return availability to clients looking for a writer as soon as a new writer is connected to regardless of the rest of the topology in terms of number of readers and their health.
What plugins are used? What other connection properties were set?
aurora-mysql
Current Behavior
When connecting to a two instance aurora mysql cluster and calling the failover-db-cluster api the failover of the driver won't complete until both instances restart (the reader gets promoted and restarts as a writer and the old writer restarts as a reader). It should complete as soon as the new writer is up.
Reproduction Steps
Create a two instance mysql cluster. Connect and send queries with the driver. Trigger failover with the api. Wait for the FailoverSuccessSQLException. Note that this comes later than the time when the new writer comes up. You can get this from the db cloudwatch logs for example.
Possible Solution
No response
Additional Information/Context
No response
The AWS Advanced JDBC Driver version used
latest
JDK version used
11
Operating System and version
osx