apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.64k stars 1.69k forks source link

ORA-01292 Oracle to Postgres #6662

Open a11dev opened 3 months ago

a11dev commented 3 months ago

Search before asking

What happened

seatunnel 2.3.4 windows server local environment deployment openjdk version "21.0.2" 2024-01-16 oracle database ( dbaas ) local postgres 3 oracle table - OracleCDC 3 Sink - jdbc

It crash with an ORA-01292 seatunnellog.zip

All logs ( and config example ) attacched.

I can increase log level if necessary.

Might it be related to an automatic backup of redologs? config.zip

SeaTunnel Version

2.3.4

SeaTunnel Config

#Also into the first attachment:

env {
    parallelism = 2
    job.mode=STREAMING
    job.name=SeaTunnel_Job
    read_limit.bytes_per_second=7000000
    read_limit.rows_per_second=400
}

  Oracle-CDC {

    result_table_name = "tab1"
    base-url = "jdbc:oracle:thin:user/password@ip:1521/service_name"
    source.reader.close.timeout = 120000
    username = "user"
    password = "password"
    database-names = ["DBNAME"]
    # real db name DBNAME.domain.local ( it works with DBNAME )
    schema-names = ["SCHEMA"]
    startup.mode = "INITIAL"
    table-names = ["DBNAME.SCHEMA.TABLE1"]
  }

}

sink {
    jdbc {
        source_table_name="tab1"
        url = "jdbc:postgresql://localhost:5432/pgdatabase"
        driver = "org.postgresql.Driver"
        user = pguser
        password = pgpassword
        generate_sink_sql = true
        database = pgdatabase
        table = public.tab1
        primary_keys = ["PKNAME"]
        field_ide = LOWERCASE,
        schema_save_mode = "CREATE_SCHEMA_WHEN_NOT_EXIST"
        data_save_mode="APPEND_DATA"
    }

}

Running Command

java -Dlog4j2.configurationFile=E:\programmi\apache-seatunnel-2.3.4\config\log4j2_client.properties -Dhazelcast.client.config=E:\programmi\apache-seatunnel-2.3.4\config\hazelcast-client.yaml -Dseatunnel.config=E:\programmi\apache-seatunnel-2.3.4\config\seatunnel.yaml -Dhazelcast.config=E:\programmi\apache-seatunnel-2.3.4\config\hazelcast.yaml -Dseatunnel.logs.file_name=seatunnel-starter-clienttest -Xms256m -Xmx512m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=E:\programmi\apache-seatunnel-2.3.4\dump\zeta-client  -cp E:\programmi\apache-seatunnel-2.3.4\lib\*;E:\programmi\apache-seatunnel-2.3.4\starter\seatunnel-starter.jar org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient  --config .\config\v2.batch.config.template -m local

Error Exception

2024-04-06 14:35:03,888 ERROR [i.d.c.o.l.LogMinerHelper      ] [debezium-reader-0] - Mining session stopped due to the java.sql.SQLException: ORA-01291: file di log mancante
ORA-16241: In attesa del file di log dell'intervallo (n. thread 2, n. sequenza 106896)

// into the first attachment

Zeta or Flink or Spark Version

Zeta

Java or Scala Version

openjdk version "21.0.2" 2024-01-16

Screenshots

everything is inside attachements.

Are you willing to submit PR?

Code of Conduct

Carl-Zhou-CN commented 3 months ago

Is it possible that the log expired and was removed

a11dev commented 3 months ago

Thanks for your email.

Yes it might be. For test environment it is not a problem, we can run it again and Seatunnel is able to recover lost sync. Production though, we know log files backup is performed every 30 minutes. is there a workaround to avoid blocks?

Thanks Alessandro

Il lun 8 apr 2024, 14:07 Carl-Zhou-CN @.***> ha scritto:

Is it possible that the log expired and was removed

— Reply to this email directly, view it on GitHub https://github.com/apache/seatunnel/issues/6662#issuecomment-2042577100, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI2V56ZYKYQOMIKVP5SHEMDY4KCBTAVCNFSM6AAAAABF4K3G7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBSGU3TOMJQGA . You are receiving this because you authored the thread.Message ID: @.***>

Carl-Zhou-CN commented 3 months ago

Is it possible that the log expired and was removed

This is just one possibility. You can also check if the Oracle archive logs and some settings are configured correctly, as it seems that your incremental phase has not been able to read the data correctly.

a11dev commented 2 months ago

I would like to share some other information gathered about this topic. Comparing other orcle cdc connector ( debezium/confluent ref: https://docs.confluent.io/kafka-connectors/oracle-cdc/current/overview.html) . As suspected it happen when the redo log is purged from the system. It is recommended to implement practices such as maintaining archived log files for a period longer than the connector might be deactivated but in my case are purged every 30 mins. Debezium is using the strategy to implement a heartbeat in the connector by setting heartbeat.interval.ms to a positive value. This helps to ensure that the source offsets are advanced regularly, reducing the risk of the redo logs being purged before they are consumed by the connector, but still if the connector is down no hearthbeat will be generated.

I hope this could be useful!

a11dev commented 2 months ago

oracle cdc configuration improved with : heartbeat.interval.ms = 10000

  Oracle-CDC {

    result_table_name = "tab1"
    base-url = "jdbc:oracle:thin:user/password@ip:1521/service_name"
    source.reader.close.timeout = 120000
    username = "user"
    password = "password"
    database-names = ["DBNAME"]
    # real db name DBNAME.domain.local ( it works with DBNAME )
    schema-names = ["SCHEMA"]
    startup.mode = "INITIAL"
    table-names = ["DBNAME.SCHEMA.TABLE1"]
    heartbeat.interval.ms = 10000
  }

how can I check it from logs?

Carl-Zhou-CN commented 2 months ago

It should be configured this way debezium { heartbeat.interval.ms = 10000 }