apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
8.02k stars 1.82k forks source link

[Bug] [zeta, spark] source is oracle, the data in the parquet file is strange #6499

Open yjkim0083 opened 8 months ago

yjkim0083 commented 8 months ago

Search before asking

What happened

If the source is oracle, the numeric data in the parquet file appears strangely. Integer type data also appears with numerous zeros after the decimal point.

SeaTunnel Version

2.3.3 or 2.3.4

SeaTunnel Config

env {
    job.mode = "BATCH"
}

source {
    Jdbc {
        url = "jdbc:oracle:thin:@//IP:PORT/DB"
        driver = "oracle.jdbc.OracleDriver"
        user = "USER"
        password = "PASSWORD"
        query = "select * from oracle_table"
    }
}

transform {
}

sink {
    LocalFile {
         path = "./"
         file_format_type="parquet"
    }
}

Running Command

sh bin/seatunnel.sh --config ./config/oracle_to_parquet.template -e local

Error Exception

If the column type of the oracle table is numeric, when querying the parquet schema, FIXED_LEN_BYTE_ARRAY Decimal(precision=38, scale=18) appears, and the actual value appears as 123.00000000
If the source is mysql, this issue does not occur.

Zeta or Flink or Spark Version

zeta

Java or Scala Version

java : 1.8.0_181

Screenshots

image

Are you willing to submit PR?

Code of Conduct

yjkim0083 commented 7 months ago

I found a solution. add java_opt "-Doracle.jdbc.J2EE13Compliant=true"