apache / arrow

Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
https://arrow.apache.org/
Apache License 2.0
14.37k stars 3.49k forks source link

[DISCUSS] [FlightSQL] FlightSQL versioning / compatibility levels #40424

Open alamb opened 7 months ago

alamb commented 7 months ago

Describe the enhancement requested

I would like to discuss adding "versions" or "compatibility levels" to FlightSQL

Usecase

The usecase is now that FlightSQL is being adopted across products, there is an increasing number of installed clients. As we make changes / add optional features to the spec such as:

The question arises "how will clients/servers know what to expect" --

At the moment, the spec defines certain features that are "optional" such as XXX (and a new optional behavior such as https://github.com/apache/arrow/pull/40243). This means:

  1. Updates to the spec must carefully define what should happen when the client and/or server do not use the optional feature
  2. Clients and Servers must similarly handle a variety of potentially present and missing features

An example of this subtlety is shown in the context of https://github.com/apache/arrow/issues/37720 where we are discussing "how should the server tell if the client is using the updated semantics" on https://github.com/apache/arrow/pull/40243 with @erratic-pattern @lidavidm and @pitrou https://github.com/apache/arrow/pull/40243#pullrequestreview-1912699524

Possible solutions

One possible

Component(s)

FlightRPC

lidavidm commented 7 months ago

The SqlInfo enum was supposed to/does contain this info

lidavidm commented 7 months ago

That said it doesn't help here when the server needs to know what the client expects

lidavidm commented 7 months ago

That said for this specific case, I'm somewhat inclined to punt instead of trying to solve feature/version negotiation. Would InfluxDB want to support a client that doesn't recognize the updated prepared statement handle? If not, then as long as it detects and errors when the handle is used, that should be enough. The client would just get the error a bit later.

pitrou commented 7 months ago

I'm skeptical about the usefulness of this, even more when adding it somewhat late in the process like this.

Past experience on related situations:

  1. the Arrow IPC formay has a features field that nobody uses AFAIK
  2. the Parquet format has a version field that is not used for any particular purpose AFAIK

A version field could be useful to convey potential incompatible changes, like the metadata_version in Arrow IPC does, but that's not what the discussion in https://github.com/apache/arrow/pull/40243 is about.

zeroshade commented 7 months ago

I pretty much agree with everything that's been said. I don't think that such a field would end up providing anything particularly useful over what exists or that would actually get used.

alamb commented 7 months ago

Would InfluxDB want to support a client that doesn't recognize the updated prepared statement handle? If not, then as long as it detects and errors when the handle is used, that should be enough. The client would just get the error a bit later.

Yes, this is likely what we would do (return an error). One potential issue raised by @erratic-pattern is that it would not be possible to distinguish between the cases:

  1. The client is old and doesn't know to send the updated handle
  2. The client simply never sent the parameter values

I personally think the error could be made clear enough, but I wanted to mention this potential issue

erratic-pattern commented 7 months ago

I do think the concerns about "spec bloat" are valid. If there aren't enough of these sort of "opt-in" behaviors then it may not be worth complicating the handshake.

Echoing what Andrew said, even though we would only return an error in the case of a client that doesn't recognize the new handle, we would still want to provide clear and unambiguous error messages about what went wrong.

Currently we can get close but it's something like "No parameters provided for this prepared statement. If parameters were provided, then this FlightSQL client is possibly out of date." This is informing the user about 2 distinct possible problems instead of 1.

Another problem with this is that you could still get this error message even on the latest version of your client, simply because your client has not been updated to support it. So "out-of-date" is not necessarily true either. What actually happened is that the client does not support a capability that the server needs.

"out of date" is also a vague explanation that doesn't tell the user what went wrong. You could instead explain in more detail in the error message, but what should it say? "Your client does not support stateless prepared statements" Most users will not know what that means. Do you give a vague response or do you explain the full context of the problem? That's a lot of context to fit into a single error message in a way that's easy to understand for someone who doesn't have the related background information about the spec change. Having a concept of capabilities in the specification and documented somewhere would make it easier to explain what is happening and link to relevant information. But there are other possible solutions to this problem, for example having links to online error documentation somewhere.

There is already a sort of concept of a "feature table" according to this table in the docs. I assume this is currently resolved via the GetFlightDescriptor call? Maybe versioning at the command level as an optional metadata field on the FlightDescriptor is an option instead?

Versioning, either at the spec level or the command level, provides a straightforward error message. "Your client cannot add parameters to this prepared statement because it uses version X of this command/action/spec instead of version Y" This provides the necessary information to either research the specific root cause or come up with a solution (i.e. update your client)

alamb commented 7 months ago

Maybe the message could be

"Error: Your client either didn't send parameter values or doesn't support stateless prepared statements"

And then we could make sure the flight sql spec has a clearly labeled header / text "stateless prepared statements" that was easy for someone to discover when searching

🤔

lidavidm commented 7 months ago

I suppose you could also introduce a separate endpoint for binding stateless prepared statements, which itself would be a kind of versioning.

alamb commented 6 months ago

I suppose you could also introduce a separate endpoint for binding stateless prepared statements, which itself would be a kind of versioning.

Indeed, but it would likely also require more substantial changes to clients (which would now have to know how to try both endpoints 🤔 )