apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.66k stars 3.56k forks source link

[Java][FlightSQL] Column Duplication When Selecting from no result record in Arrow Flight SQL JDBC Driver #44467

Open mingnuj opened 1 month ago

mingnuj commented 1 month ago

Describe the bug, including details regarding any error messages, version, and platform.

I encountered an issue when working with Arrow Flight SQL, and I would appreciate your help.

I created a test_table with the following columns:

CREATE TABLE default.public.test_table (col1 int);

When I run a SELECT * query without inserting any data into the table, the columns appear duplicated. I get the following output: image

I am developing the database myself, using Rust and connecting through FlightSQL. When executing SELECT clause and the table is empty, I handle this by returning an endpoint with empty endpoint in the get_flight_info method.

This did not occur in Arrow Flight SQL JDBC Driver versions prior to 15.0.0, but it started happening with version 15.0.0.

Component(s)

Java

mingnuj commented 4 weeks ago

I've been building each Flight SQL-related branch following the Arrow 15.0.0 release and identified an issue occurring after commit [GH-33475: Add parameter binding for Prepared Statements in JDBC driver (#38404)].

After debugging last week, I observed the following:

When querying from DBeaver, the prepareAndExecute function is called within ArrowFlightMetaImpl.java in flight-sql-jdbc-core. Before this commit, the signature in ArrowFlightMetaImpl was a fixed value, declared as final. However, with the parameter binding addition, the code now utilizes the handle's signature, making it mutable.

The addition is needed for parameter binding, yet it introduces an issue where columns are appended repeatedly. During execution, the executeFlightInfoQuery function in ArrowFlightStatement.java (in flight-sql-jdbc-core) is invoked, which appends columns.

@Override
public FlightInfo executeFlightInfoQuery() throws SQLException {
   final PreparedStatement preparedStatement = getConnection().getMeta().getPreparedStatement(handle);
   final Meta.Signature signature = getSignature();
   if (signature == null) {
       return null;
   }

   final Schema resultSetSchema = preparedStatement.getDataSetSchema();
   signature.columns.addAll(ConvertUtils.convertArrowFieldsToColumnMetaDataList(resultSetSchema.getFields()));
   setSignature(signature);

   return preparedStatement.executeQuery();
}

With every query, the signature column set is duplicated and propagated. In cases where the result has endpoints (e.g., a typical doPut operation in FlightSqlClient), the column list seems consistent as the signature is replaced with the result signature. However, in my case, where there are no endpoints, the columns appear duplicated.

To address this, I temporarily fixed the issue by simply calling clear on the column list before adding new columns.

@Override
public FlightInfo executeFlightInfoQuery() throws SQLException {
   final PreparedStatement preparedStatement = getConnection().getMeta().getPreparedStatement(handle);
   final Meta.Signature signature = getSignature();
   if (signature == null) {
       return null;
   }

   final Schema resultSetSchema = preparedStatement.getDataSetSchema();
   signature.columns.clear();
   signature.columns.addAll(ConvertUtils.convertArrowFieldsToColumnMetaDataList(resultSetSchema.getFields()));
   setSignature(signature);

   return preparedStatement.executeQuery();
}

However, I believe a more fundamental solution could be beneficial here. Could anyone provide insights or assistance? Thank you.