apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.72k stars 3.57k forks source link

[Python]: Arrow Flight SQL server communication issue with JDBC Arrow FlightSQL driver #37925

Open nitesh-sinha opened 1 year ago

nitesh-sinha commented 1 year ago

Describe the usage question you have. Please include as many useful details as possible.

Hello,

I'm trying to build an Arrow Flight SQL server(which wraps DuckDB querying parquet files) in Python. I've implemented the handler methods defined in pyarrow FlightServerBase class and testing it with a Dbeaver client(loaded with JDBC driver for Arrow Flight SQL). However even though the client connects successfully with the server, it is unable to read any of the data sent back from the server. I'm suspecting it might be due to the RecordBatch structure? After a lot of reading up the docs, I've tried various ways of creating the RecordBatch with no luck.

For debugging simplicity I hand-wrote the following RecordBatch to be sent for a DoGet RPC call(with CommandGetSqlInfo command) in the Ticket. Can someone help point out any errors in this?

def do_get_sql_info(self, context: flight.ServerCallContext, cmd: sqlPb.CommandGetSqlInfo) -> flight.FlightDataStream:
        sql_info_metadata = [
            {"info_name": "0", "value": "db_name"},
            {"info_name": "1", "value": "duckdb"},
        ]

        schema = pa.schema([
            pa.field("info_name", pa.uint32()),
            pa.field("value", pa.dense_union([
                pa.field("string_value", pa.string()),
                pa.field("bool_value", pa.bool_()),
                pa.field("bigint_value", pa.int64()),
                pa.field("int32_bitmask", pa.int32()),
                pa.field("string_list", pa.list_(pa.string())),
                pa.field("int32_to_int32_list_map", pa.map_(pa.int32(), pa.list_(pa.int32())))
            ]))
        ])
        batch = pa.RecordBatch.from_pandas(pd.DataFrame(sql_info_metadata), schema=schema)
        return flight.FlightDataStream(batch)

The client is unable to read the DB name as duckdb, instead it just prints ??

Note:

Appreciate some pointers on this. Thanks!

Component(s)

FlightRPC, Python

jduo commented 1 year ago

Hi @nitesh-sinha , The info_name field is being misused. It should be an integer, rather than a string. The integer should have the same value as an entry in the SqlInfo enum (https://github.com/apache/arrow/blob/e8360615adf6c5a9bb76b81267d08388c7cfc3a9/format/FlightSql.proto#L76).

A tool such as DBeaver may require many more SqlInfo properties to be implemented since it can call getDatabaseMetaData()

nitesh-sinha commented 1 year ago

Sorry for the late response James and thanks for your inputs. Arrow Flight work is currently paused at the moment due to other priorities. I will validate your observations when I pick it up.

Thanks Nitesh

On Thu, Oct 26, 2023 at 1:31 AM James Duong @.***> wrote:

Hi @nitesh-sinha https://github.com/nitesh-sinha , The info_name field is being misused. It should be an integer, rather than a string. The integer should have the same value as an entry in the SqlInfo enum ( https://github.com/apache/arrow/blob/e8360615adf6c5a9bb76b81267d08388c7cfc3a9/format/FlightSql.proto#L76 ).

A tool such as DBeaver may require many more SqlInfo properties to be implemented since it can call getDatabaseMetaData()

— Reply to this email directly, view it on GitHub https://github.com/apache/arrow/issues/37925#issuecomment-1779966754, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXVVF5ITIYGJD57QODRGADYBFV2BAVCNFSM6AAAAAA5KP2CZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZZHE3DMNZVGQ . You are receiving this because you were mentioned.Message ID: @.***>

shivarajugowda commented 10 months ago

May be a reference Flight SQL Server implementation in Python just like the ones available in Java and C++ would help. Is there any such thing in roadmap?

shivarajugowda commented 10 months ago

Similar request: https://github.com/apache/arrow/issues/37700