apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
13.88k stars 3.38k forks source link

[FlightSQL] Support `DoExchange` (in addition to `DoPut`) to bind parameters and execute prepared statements #37741

Open alamb opened 9 months ago

alamb commented 9 months ago

Describe the enhancement requested

As suggested by @kou on https://github.com/apache/arrow/issues/37720#issuecomment-1720662204

Usecase

  1. Reduce the number of messages round trips required to run a prepared statement via FlightSQL
  2. Avoid the need to (potentially) serialize bind parameters in a stateless architecture (see https://github.com/apache/arrow/issues/37720)

Background

Currently FlightSQL requires three messages to run a prepared statement. DoPut, then a GetFlightInfo and then a DoGet:

...
DoPut `CommandPreparedStatementQuery` 
GetFlightInfo `CommandPreparedStatementQuery`
DoGet 
...

CommandPreparedStatementQuery mmd

Proposal

By supporting DoExchange instead of DoPut + GetFlightInfo + DoGet only a single round trip is needed:

...
DoExchange `CommandPreparedStatementQuery` 
...

Benefits:

Drawbacks:

Component(s)

FlightRPC

lidavidm commented 9 months ago

SGTM

We should add flags in (the awkwardly named) SqlInfo to indicate support for these as we did with other new features

zeroshade commented 9 months ago

+1 and I agree with @lidavidm that we should add flags to SqlInfo

suremarc commented 5 months ago

I have started looking into implementing this on the Go client side, and I'm running into some difficulties. Namely, the existing interface for PreparedStatement.Execute returns a *FlightInfo, but DoExchange skips GetFlightInfo altogether. The Rust client also presents a similar interface, and I would guess that most clients in other languages do the same.

Of course, one can always drop down to raw Flight/FlightSQL calls without using the abstraction of prepared statement "handles", but this just moves complexity to consumers of the library and makes me worry if the DoExchange protocol for prepared statements will actually be adopted in the ecosystem.

So we have a few options:

  1. Change PreparedStatement.Execute to fetch the flight streams instead of just returning the FlightInfo
    • Pros: simpler for users, and hides details like using DoPut/DoExchange
    • Cons: disruptive breaking change, less control over consumption of the flight streams.
  2. Add a new PreparedStatement.DoExchange method, separate from the existing Execute implementation that uses DoPut
    • Pros: less disruptive
    • Cons: dispatching to the correct implementation (aka whatever the server supports) is still forced on users of this library

@alamb do you have any thoughts on this?

alamb commented 5 months ago

@suremarc can you remind me why we would need to pass a FlightInfo? I think we could send the required information as part of the payload of the stream, right?

For example

DoExchange https://github.com/apache/arrow/blob/aded7bf37686a16fc4b0649ab97231427a219d7b/format/Flight.proto#L127

Sends a stream of FlightData https://github.com/apache/arrow/blob/aded7bf37686a16fc4b0649ab97231427a219d7b/format/Flight.proto#L495-L520

And the FlightData:: flight_descriptor field has the embedded cmd message (in which FlightSQL messages are embedded) https://github.com/apache/arrow/blob/aded7bf37686a16fc4b0649ab97231427a219d7b/format/Flight.proto#L310

FYI @kallisti-dev who is also working on stateless prepared statement execution

suremarc commented 5 months ago

@alamb I think we are on the same page that DoExchange does not require passing a FlightInfo — the problem is that the existing prepared statement interfaces do require returning a FlightInfo back to the client. Introducing the DoExchange protocol breaks this assumption.

For example, see the Go prepared statement implementation: PreparedStatement.Execute, which returns a FlightInfo. The Rust FlightSQL client does this too.

kou commented 4 months ago

How about just returning null or an empty FlightInfo (no endpoint)?

kou commented 4 months ago

Ah, we may want to return *flight.Reader like Client.DoGet() for DoExchange version.

alamb commented 4 months ago

@alamb I think we are on the same page that DoExchange does not require passing a FlightInfo — the problem is that the existing prepared statement interfaces do require returning a FlightInfo back to the client. Introducing the DoExchange protocol breaks this assumption.

I always think of FlightInfo as a source of potential indirection (as it can contain endpoint information so the subsequent call may be to a different endpoint / hostname)

Thus if there is no subsequent calls (just DoExchange instead of DoPut/GetFlightInfo/DoGet) the lack of FlightInfo makes sense to me (as DoExchange starts feeding back data from whatever endpoint you sent data to)