deephaven / deephaven-core

Deephaven Community Core
Other
256 stars 80 forks source link

Client APIs should support fetching tables (and other objects) given arbitrary ticket bytes #4454

Open niloc132 opened 1 year ago

niloc132 commented 1 year ago

As a developer of a client application using provided API clients, I sometimes want to specify a ticket directly using bytes to indicate a custom ticket resolver or use case that the client API didn't account for.

Our client APIs should be somewhat consistent in how we present this - we already have the notion in gRPC of a TypedTicket, containing both an arrow Ticket (message wrapping opaque bytes) and a string type (indicating the known type of the object on the server). As exposed to user code, this API should also offer support for (re)exporting the ticket's contents, and ideally offer existing named "scope" or "application" tickets as part of the same sort of API.

In each language, we usually take care not to expose gRPC or ProtoBuf types to user code, but provide APIs that will do the work of building messages and making calls.

There are several possible paths here in the implementation, but they mostly boil down to either calling TableService/GetExportedTableCreationResponse to get the ETCR of a given ticket without exporting it, or calling TableService/FetchTable to export the ticket into a client-provided export, then respond with an ETCR so the client has the schema, size. Note that there have historically been use cases to re-export existing export tickets, and without exporting a table, the TableService/ExportedTableUpdates stream will not notify of table failure or size changes and the table instance could change between operations, since technically other ticket resolvers need not be return the same instance with each request to the same ticket.

For non-table objects, SessionService/ExportFromTicket should be used to re/export the object to a client-defined ticket, and ObjectService/MessageStream should be used to fetch/connect to the object. As above, exported tickets ensure that the instance won't change from call to call, but unlike tables, there is no ExportedTableUpdates stream - instead the MessageStream provides the information that the server has on changes that are relevant to each client.

Existing APIs

Python

In python today we have open_table which takes only the name of the ticket. The client re-exports the ticket, then returns a table using that exported ticket. Today, the name= named parameter is used to indicate the name in the server's global scope to fetch from the server - perhaps it could also have a ticket= named parameter that accepts bytes. Another optional parameter might exist to indicate if the table should be re-exported - if false, the client should not call FetchTable, but just obtain the ETCR and produce a table with it.

JavaScript

In JavaScript today we have getTable and getObject, where getTable takes only a table name, but getObject takes either an application ID and object ID or object name, in addition to a type. Alternatively, instead of either appId and id or name, it could take a byte array to represent the ticket itself, and another boolean field for re-exporting.

Java

The Java client today provides a HasTicketId interface with several built-in implementations, thus allowing a user to provide their own implementation. At a glance, it appears that the client never will re-export tables, but the signature of BarrageSession.subscribe requires that the user has likewise already ensured that an export table exists.

C++, R

C++ and R clients have the capability of specifying arbitrary tickets, but have not been discussed under this broader umbrella at this time.

Go

I can't read Go well enough to be sure, but it appears to have documented that getting a table by name (from the application and possibly scope?) via getTable is a separate operation than fetching it fetchTable. There is also OpenTable, which is described by the docs of getTable as able to get an exported TableHandle, but that doesn't appear to be accurate.

jcferretti commented 1 year ago

Colin noted in the #tmp-r-client channel in slack that tickets can contain non-ascii bytes including zeroes, with the implication that a generalized API that takes a ticket as a parameter may need the best mapping of proto bytes for the language in question.

C++ is fine as is because std::string in C++ is the mapping for bytes. But in python for example the str type is the wrong type. Colin's original comment:

also worth nothing that many tickets exist which are not strings, the last 4 bytes of an export ticket is a signed int32's bytes
so for example it is perfectly legal (and expected) to have null bytes in there

https://illumon.slack.com/archives/C053QMX2LBW/p1694096962549719

kosak commented 1 year ago

My opinion is that the C++ API should be

TableHandle TableHandleManager::FetchTable(std::variant<std::string, Ticket> table) const;

In the above Ticket is not the protobuf type but rather an opaque type that we introduce, with factory methods:

class Ticket {
  static Ticket CreateScoped(std::string table_name);
  static Ticket CreateWhatever(whatever);
  static Ticket CreateCustom(std::string your_bytes_here);
}

Python can follow naturally with

fetch_table(table : string | Ticket) -> Table

Languages like Java will probably use overloading instead (provide two functions named fetch_table with different arguments)

Languages like Go might have to take the empty interface {} and type-test for string vs Ticket (this is ultimately what Python does anyway, though at least Python has the type annotation stuff for decorating the signature)

I think this matches the expected usage patterns. Users who don't know any better can provide a table name. Power users who want tickets have to do extra typing to create Tickets.

As for the suggestion that we just take a string and peer inside it and decide ourselves whether it is a ticket or a table name. I think such "Do What I Mean" style methods can have a useful place, either at a very high level in an API, or in some kind of library of "helpful helper functions". I don't think they belong in our API here though. I think they inevitably lead to this kind of creeping defensiveness where every single level of abstraction doesn't know if it has a table name or a ticket and so every single level has to do the same disambiguation test. Much better to know up front.

Unhelpful quote:

A person with one watch knows what time it is. A person with two watches is never sure.

jmao-denver commented 1 month ago

The ticket for Python client #5944 has been closed by https://github.com/deephaven/deephaven-core/pull/5990.

alexpeters1208 commented 1 month ago

This was implemented in R over a year ago by https://github.com/deephaven/deephaven-core/pull/4453, and can be found at https://github.com/deephaven/deephaven-core/blob/cd237b1598423189d20345a745dc7411ff96d52a/R/rdeephaven/R/client_wrapper.R#L291.