JDBC adapter query Postgres numeric field error: Cannot get simple type for type DECIMAL

RealDeanZhao commented 3 weeks ago

What happened?

Stack Trace

Invalid Input Error: arrow_scan: get_next failed(): java.lang.RuntimeException: Error occurred while getting next schema root. at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:190) at org.apache.arrow.adbc.driver.jdbc.JdbcArrowReader.loadNextBatch(JdbcArrowReader.java:87) at org.apache.arrow.c.ArrayStreamExporter$ExportedArrayStreamPrivateData.getNext(ArrayStreamExporter.java:66) Caused by: java.lang.RuntimeException: Error occurred while consuming data. at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.consumeData(ArrowVectorIterator.java:112) at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.load(ArrowVectorIterator.java:163) at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:183) ... 2 more Caused by: java.lang.UnsupportedOperationException: Cannot get simple type for type DECIMAL at org.apache.arrow.vector.types.Types$MinorType.getType(Types.java:815) at org.apache.arrow.adapter.jdbc.consumer.CompositeJdbcConsumer.consume(CompositeJdbcConsumer.java:49) at org.apache.arrow.adapter.jdbc.ArrowVectorIterator.consumeData(ArrowVectorIterator.java:98) ... 4 more À

How can we reproduce the bug?

Numeric field without scale and precision will cause the error

create table  xxx (
 numeric_a numeric
)

Also tried debug the code and found pg jdbc driver getBigDecimal will return a BigDecimal with precision 1. This will cause the actual error: "BigDecimal precision cannot be greater than that in the Arrow vector'

// org.apache.arrow.adapter.jdbc.consumer.DecimalConsumer.NullableDecimalConsumer.consume
public void consume(ResultSet resultSet) throws SQLException {
// value's scale is 0 and precision is 1, this will cause the error
            BigDecimal value = resultSet.getBigDecimal(this.columnIndexInResultSet);
            if (!resultSet.wasNull()) {
                this.set(value);
            }

            ++this.currentIndex;
        }

// org.apache.arrow.vector.util.DecimalUtility.checkPrecisionAndScale
 public static boolean checkPrecisionAndScale(BigDecimal value, int vectorPrecision, int vectorScale) {
        int var10002;
        if (value.scale() != vectorScale) {
            var10002 = value.scale();
            throw new UnsupportedOperationException("BigDecimal scale must equal that in the Arrow vector: " + var10002 + " != " + vectorScale);
        } else if (value.precision() > vectorPrecision) {
// value precision is 1 and vector precision is 0 
            var10002 = value.precision();
            throw new UnsupportedOperationException("BigDecimal precision cannot be greater than that in the Arrow vector: " + var10002 + " > " + vectorPrecision);
        } else {
            return true;
        }
    }

Environment/Setup

No response

lidavidm commented 2 weeks ago

Hmm, Postgres NUMERIC fields without a fixed precision/scale can't actually be supported by Arrow because those are variable/unlimited precision and Arrow assumes a fixed precision per field.

For BigQuery, we need to read the type correctly.

Note that we have been considering a JNI bridge to use the native ADBC drivers for both these databases. That should be faster than the JDBC driver and should handle these cases better as the drivers have had more individual attention for each database's quirks (vs for JDBC which just tries to generically adapt the results from JDBC).

RealDeanZhao commented 1 week ago

Hmm, Postgres NUMERIC fields without a fixed precision/scale can't actually be supported by Arrow because those are variable/unlimited precision and Arrow assumes a fixed precision per field.

For BigQuery, we need to read the type correctly.

Note that we have been considering a JNI bridge to use the native ADBC drivers for both these databases. That should be faster than the JDBC driver and should handle these cases better as the drivers have had more individual attention for each database's quirks (vs for JDBC which just tries to generically adapt the results from JDBC).

https://arrow.apache.org/cookbook/java/jdbc.html#id5

Is it possible to use a custom JdbcToArrowConfig to avoid this issue? Seems that the JdbcArrowReader use a default config.

   JdbcArrowReader(BufferAllocator allocator, ResultSet resultSet, @Nullable Schema overrideSchema) throws AdbcException {
        super(allocator);
        JdbcToArrowConfig config = makeJdbcConfig(allocator);

apache / arrow-adbc