awslabs / aws-mysql-jdbc

The Amazon Web Services (AWS) JDBC Driver for MySQL is a driver that enables applications to take full advantage of the features of clustered MySQL databases.
https://awslabs.github.io/aws-mysql-jdbc/
Other
226 stars 49 forks source link

NodeMonitoringConnectionPlugin causes Exceptions #574

Closed flauschie closed 4 months ago

flauschie commented 4 months ago

Describe the bug

The issue started out of the blue a couple of days ago (May 5th).

The NodeMonitoringConnectionPlugin produces periodic exception logs in 2 of our 6 production instances on AWS. According to the stacktrace it relates to the HA plugin of the AWS MySQL driver. The issue is periodically due to Wildfly's JTA Recovery procedure.

The application execution is NOT affected by this as far as we can tell.

The logging started almost concurrently on all connected microservices (20 of them). A restart of the application servers made no difference. We did not restart RDS MySQL because there was no reason to do so so far.

Each service pod is consistently producing about 4000 of these exceptions per day.

There was no DB-related application error logged within several hours before these logs.

We tried to investigate the problem but to no avail.

Our XA transactions do potentially span across multiple RDS MySQL 8.0.33 schemas as well as a schema in Aurora Serverless MySQL v2.

The RDS MySQL and Aurora DB logs show no errors that match with the start of these exceptions.

Expected Behavior

Unclear ...

Current Behavior


ARJUNA016027: Local XARecoveryModule.xaRecovery got XA exception XAException.XAER_RMERR: software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.MysqlXAException: XAER_RMERR: Fatal error occurred in the transaction branch - check your data for consistency
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.MysqlXAConnection.mapXAExceptionFromSQLException(MysqlXAConnection.java:349)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.MysqlXAConnection.recover(MysqlXAConnection.java:189)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.SuspendableXAConnection.recover(SuspendableXAConnection.java:143)
        at org.jboss.ironjacamar.jdbcadapters@1.5.3.Final//org.jboss.jca.adapters.jdbc.xa.XAManagedConnection.recover(XAManagedConnection.java:373)
        at org.jboss.ironjacamar.impl@1.5.3.Final//org.jboss.jca.core.tx.jbossts.XAResourceWrapperImpl.recover(XAResourceWrapperImpl.java:185)
        at org.jboss.ironjacamar.impl@1.5.3.Final//org.jboss.jca.core.tx.jbossts.XAResourceWrapperStatImpl.recover(XAResourceWrapperStatImpl.java:144)
        at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoveryFirstPass(XARecoveryModule.java:735)
        at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:249)
        at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkFirstPass(XARecoveryModule.java:191)
        at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:770)
        at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)
Caused by: java.sql.SQLException: XAER_RMERR: Fatal error occurred in the transaction branch - check your data for consistency
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:130)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.StatementImpl.executeQuery(StatementImpl.java:1204)
        at jdk.internal.reflect.GeneratedMethodAccessor742.invoke(Unknown Source)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ha.ConnectionProxy$JdbcInterfaceProxy.lambda$invoke$0(ConnectionProxy.java:355)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ha.plugins.DefaultConnectionPlugin.execute(DefaultConnectionPlugin.java:80)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ha.plugins.NodeMonitoringConnectionPlugin.execute(NodeMonitoringConnectionPlugin.java:249)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ha.plugins.failover.FailoverConnectionPlugin.execute(FailoverConnectionPlugin.java:276)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ha.plugins.ConnectionPluginManager.execute(ConnectionPluginManager.java:139)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ha.ConnectionProxy$JdbcInterfaceProxy.invoke(ConnectionProxy.java:352)
        at com.mysql//com.sun.proxy.$Proxy259.executeQuery(Unknown Source)
        at com.mysql//software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.MysqlXAConnection.recover(MysqlXAConnection.java:168)
        ... 9 more

Reproduction Steps

The root cause of this issue is unclear thus ...

Possible Solution

No response

Additional Information/Context

No response

The AWS JDBC Driver for MySQL version used

1.1.14

JDK version used

11

Operating System and version

Amazon Linux 2

flauschie commented 4 months ago

Ok, we found the root cause.

When migrating to RDS MySQL 8, we added XA_RECOVER_ADMIN privileges to the user.

However, somewhere along the way, this permission was removed. We still wonder how.

Anyway, after having granted the permission again, the error is gone.

flauschie commented 4 months ago

As an additional info for those who may stumble across the same issue:

Apparently, AWS removed the user privilege ( XA_RECOVER_ADMIN ) during an RDS maintenance window - without any notice that is ...

XA_RECOVER_ADMIN is required for JTA transaction recovery.

The errors in our various instances coincided with the AWS maintenance windows in the different regions.