Open PaoloVoronini opened 4 days ago
What would cause the adder thread to be interrupted besides shutdown?
thanks Brett for the quick response!
unfortunately, i was not able to figure out the root cause reliably. similar to other issues i mentioned, we observe this upon database outages this is the last piece of the stacktrace (please find below) so far i can only speculate that possibly the ojdbc sets the interrupted status of the thread for some reason
PS: just in case, i should mention that i wrote a little method to support the statement "thread does not sleep"
public static void main(String[] args) {
Thread t = new Thread(() -> {
while (true) {
try {
System.out.println("sleeping");
Thread.sleep(3000);
} catch (InterruptedException e) {
System.out.println("interrupted");
Thread.currentThread().interrupt();
}
}
});
t.start();
t.interrupt();
}
End of the stacktrace:
Caused by: java.io.InterruptedIOException: Socket read interrupted
at oracle.net.nt.TimeoutSocketChannel.initializeSocketChannel(TimeoutSocketChannel.java:278) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TimeoutSocketChannel.connect(TimeoutSocketChannel.java:235) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TimeoutSocketChannel.<init>(TimeoutSocketChannel.java:202) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpNTAdapter.establishSocket(TcpNTAdapter.java:335) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpsNTAdapter.establishSocket(TcpsNTAdapter.java:248) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:224) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:227) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnOption.connect(ConnOption.java:332) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnStrategy.executeConnOption(ConnStrategy.java:1199) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:742) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:708) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:959) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:328) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.connectNetworkSessionProtocol(T4CConnection.java:3411) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:1016) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
... 14 more
Thanks. Still trying to figure this out. java.io.InterruptedIOException
extends java.io.IOException
which extends java.lang.Exception
. None of them extends java.lang.InterruptedException
. The above exception should be caught and handled here. Still not a thread interrupt...
Is this running in Google GCD or something? Are you supplying an externally managed ScheduledExecutorService
to the pool?
1) we have observed this both with a component running in Google Cloud and on-premise
2) no, we do not supply ScheduledExecutorService
3) why have i not shared the full stacktrace? sorry! please find it below now
PS: yes, it does not make sense to me, neither.
first, the exception is not InterruptedException, but i could explain it (just a possibility) with original exception being something like ClosedByInterruptException
, being caught somewhere (in ojbc or java.nio) and suppressed without clearing the interrupted state of the thread - this would explain the interrupt status without InterruptedException.
second, still someone should call "interrupt" on our thread in the first place. for this i see no explanation at all.
2024-10-24 09:34:52,386 INFO ? [UAT connection adder] throwing
java.sql.SQLRecoverableException: ORA-17002: I/O error: , Authentication lapse 0 ms.
https://docs.oracle.com/error-help/db/ora-17002/
at oracle.jdbc.driver.T4CConnection.handleLogonIOException(T4CConnection.java:1658) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.handleLogonInterruptedIOException(T4CConnection.java:1519) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:1132) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:1178) [ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:105) [ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:886) [ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:693) [ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138) [HikariCP-5.0.1.jar:?]
at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:359) [HikariCP-5.0.1.jar:?]
at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:201) [HikariCP-5.0.1.jar:?]
at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:470) [HikariCP-5.0.1.jar:?]
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:733) [HikariCP-5.0.1.jar:?]
at com.zaxxer.hikari.pool.HikariPool$PoolEntryCreator.call(HikariPool.java:712) [HikariCP-5.0.1.jar:?]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
Caused by: java.io.IOException: , Authentication lapse 0 ms.
at oracle.jdbc.driver.T4CConnection.handleLogonIOException(T4CConnection.java:1653) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
... 16 more
Caused by: java.io.InterruptedIOException:
at oracle.net.nt.TcpNTAdapter.handleEstablishSocketException(TcpNTAdapter.java:401) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpNTAdapter.establishSocket(TcpNTAdapter.java:346) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpsNTAdapter.establishSocket(TcpsNTAdapter.java:248) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:224) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:227) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnOption.connect(ConnOption.java:332) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnStrategy.executeConnOption(ConnStrategy.java:1199) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:742) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:708) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:959) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:328) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.connectNetworkSessionProtocol(T4CConnection.java:3411) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:1016) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
... 14 more
Caused by: java.io.InterruptedIOException: Socket read interrupted
at oracle.net.nt.TimeoutSocketChannel.initializeSocketChannel(TimeoutSocketChannel.java:278) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TimeoutSocketChannel.connect(TimeoutSocketChannel.java:235) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TimeoutSocketChannel.<init>(TimeoutSocketChannel.java:202) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpNTAdapter.establishSocket(TcpNTAdapter.java:335) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpsNTAdapter.establishSocket(TcpsNTAdapter.java:248) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpNTAdapter.connect(TcpNTAdapter.java:224) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.TcpsNTAdapter.connect(TcpsNTAdapter.java:227) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnOption.connect(ConnOption.java:332) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnStrategy.executeConnOption(ConnStrategy.java:1199) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.nt.ConnStrategy.execute(ConnStrategy.java:742) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.resolver.AddrResolution.resolveAndExecute(AddrResolution.java:708) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.ns.NSProtocol.establishConnection(NSProtocol.java:959) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.net.ns.NSProtocol.connect(NSProtocol.java:328) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.connectNetworkSessionProtocol(T4CConnection.java:3411) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:1016) ~[ojdbc11-23.4.0.24.05.jar:23.4.0.24.05]
... 14 more
java.sql.SQLRecoverableException: ORA-17002: I/O error: , Authentication lapse 0 ms.
This is a very specific and unusual error. Many threads point to an Oracle listener process falling over. But that is not conclusive.
I suggest looking at these Google results and this ChatGPT diagnosis.
It seems to me handling InterruptedException
is a separate issue. My general reaction is that it is bad behavior to simply "clear and ignore" an InterruptedException
. If anything, the entire pool should shutdown rather than continuing. I suspect addressing the underlying connection issue will also resolve the InterruptedException
issue.
Also, you should ensure that you are running in HikariCP v6.2.1. Some threads point to improperly closed connections, and v6.0.0 contained a fix related to connection closing (double-close).
https://github.com/brettwooldridge/HikariCP/blob/ffdb8106bccb285d1775af1e8b040bfce4588dc1/src/main/java/com/zaxxer/hikari/pool/HikariPool.java#L758C19-L758C31
Method UtilityElf.quietlySleep() retains the interrupted flag As a result PoolEntryCreator does not sleep at all - i.e. no backoff. Possibly the interrupted flag also prevents creating new pool entry.
@brettwooldridge my suggestion would be to clear the interrupted flag in the snippet above. i.e.
PS: this might be related to #2208 and other "database outage" issues mentioned in it