apache / beam

Apache Beam is a unified programming model for Batch and Streaming data processing.
https://beam.apache.org/
Apache License 2.0
7.76k stars 4.21k forks source link

Timeout handling in JdbcIO with Oracle java driver #19069

Open kennknowles opened 2 years ago

kennknowles commented 2 years ago

Hello, we have a batch pipeline that is reading data from Oracle, writing it to BigQuery. Using JdbcIO, BigQueryIO and Oracle driver [3].

We run this job 2 times in a day, the pipeline usually works fine, it completed within 10 minutes.

But sometimes we have a [1] warning from this job, it never finished until we cancelling it. (sometimes means 4 times in last month)

Even we set timeout configuration for our code [2], warning log says [1] Processing stuck in step Read JDBC.   Would you explain and correct us this kind of problem?

We already contacted Google Support Team, they said our configuration code is looked fine them. Then they instructed us to ask here.

Thanks, Bya

[1] Warning Log


Processing stuck in step Read JDBC/JdbcIO.Read/JdbcIO.ReadAll/ParDo(Read) for at least 13h40m00s without
outputting or completing in state process at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) at oracle.net.nt.TimeoutSocketChannel.read(TimeoutSocketChannel.java:144)
at oracle.net.ns.NIOHeader.readHeaderBuffer(NIOHeader.java:82) at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:139)
at oracle.net.ns.NIOPacket.readFromSocketChannel(NIOPacket.java:101) at oracle.net.ns.NIONSDataChannel.readDataFromSocketChannel(NIONSDataChannel.java:80)
at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:98) at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612) at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:226)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:59) at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:747)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:904) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1082)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3780) at
oracle.jdbc.driver.T4CPreparedStatement.executeInternal(T4CPreparedStatement.java:1343) at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3822)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1165)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.apache.commons.dbcp2.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:83)
at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn.processElement(JdbcIO.java:601) at org.apache.beam.sdk.io.jdbc.JdbcIO$ReadFn$DoFnInvoker.invokeProcessElement(Unknown
Source)

 

[2] Jdbc configure


import org.apache.beam.sdk.io.jdbc.JdbcIO;
import oracle.jdbc.driver.OracleConnection;

JdbcIO.DataSourceConfiguration
 dataSource = JdbcIO
                .DataSourceConfiguration.create("oracle.jdbc.driver.OracleDriver",
oracleUrl)
                .withUsername(oracleUser)
                .withPassword(oraclePassword)

               .withConnectionProperties(String.format("%s=%s",
                        OracleConnection.CONNECTION_PROPERTY_THIN_NET_CONNECT_TIMEOUT,
10000))
                .withConnectionProperties(String.format("%s=%s",
                        OracleConnection.CONNECTION_PROPERTY_THIN_READ_TIMEOUT,
600000))
                .withConnectionProperties(String.format("%s=%s",

OracleConnection.CONNECTION_PROPERTY_TIMEZONE_AS_REGION, false))
                .withConnectionProperties(String.format("%s=%s",

                       OracleConnection.CONNECTION_PROPERTY_NET_KEEPALIVE, false))
        ;

pipeline.apply(JdbcIO.<TableRow>read()

   .withDataSourceConfiguration(dataSource)
    .withQuery(readQuery)
    .withCoder(TableRowJsonCoder.of())

   .withRowMapper(new TableRowRowMapper())
);

 

[3] Oracle DB driver: ojdbc8 12.2.0.1

 

 

Imported from Jira BEAM-5305. Original Jira may contain additional context. Reported by: bya.

thevijayshekhawat commented 12 months ago

Hello, We were facing a similar issue in production while using the JDBC IO for writes, here is how we solved it.

Background: We run over close to a hundred streaming dataflow pipelines to write to Postgres, Every few of those pipelines would get stuck in JDBC IO write operation and would never recover, We had to close these pipelines and since these were streaming pipelines this would happen all around the clock causing a lot of on-call burden.

The issue here is apparently happening on the Network Layer, Many JDBC drivers, have a default socketTimeout of "infinite" which means if the application in this case (Dataflow) sends a read/write request to the database and the database successfully executes that request. It sends back the result but it never reaches the application (i.e. Packet Loss or Connection Drop). This would result in this Limbo state where the application thinks the database is still working on the request and the database on the other hand thinks it has successfully executed the request.

How we fixed it:

Add socketTimeout to JDBC Driver configuration: This will ensure that any network operation (read/write) will time out after a specified period, preventing indefinite hangs.

Optional(If not enabled already): Add statementTimeout: If a SQL query exceeds the statementTimeout, the database engine will cancel the query execution. This prevents long-running or potentially stuck queries from occupying system resources indefinitely.

kennknowles commented 11 months ago

This is great. Would you contribute a fix to Beam?

thevijayshekhawat commented 10 months ago

Sure, Happy to Contribute

kennknowles commented 10 months ago

Thank you!