delta-io / delta-sharing

An open protocol for secure data sharing
https://delta.io/sharing
Apache License 2.0
719 stars 154 forks source link

Contribute Clojure client to community OSS connector list #493

Closed lukeneil closed 3 weeks ago

lukeneil commented 1 month ago

This PR contributes a Clojure client developed and maintained by Amperity to the README OSS connector list. This client fully implements all features of the latest version of the Delta Sharing protocol.

Open question: Given the above, is the proposed list of "Supported Features" on this connector accurate as described?

linzhou-db commented 4 weeks ago

Thanks @lukeneil for the PR. It looks good!

The list is a bit outdated for some connectors.

For the Clojure connector, I wonder what you are thinking of the "fully implements all features of the latest version of the Delta Sharing protocol." Do you have a list of features you've implemented?

lukeneil commented 4 weeks ago

Thanks @lukeneil for the PR. It looks good!

The list is a bit outdated for some connectors.

For the Clojure connector, I wonder what you are thinking of the "fully implements all features of the latest version of the Delta Sharing protocol." Do you have a list of features you've implemented?

@linzhou-db

Of course, happy to expand on this point. For context, our Clojure connector supports querying all REST APIs as specified in the latest version of the protocol; additionally supporting response formats in parquet or delta, and responses for tables with advanced reader features.

Effectively, we did our best to implement a HTTP client wrapper that supported the protocol as comprehensively as possible.

One difference I will call out between our Clojure connector and the Python or Scala connectors (for example) is that ours is currently only focused at providing a 1:1 interface with the Delta Sharing API, and as such does not provide higher-order data reading abstractions (such as loading the table as a DataFrame or sampling rows). Internally we use the responses from this client in conjunction with other open source Parquet readers (e.g. Hadoop) to actually read the data, but at present we are only providing a pure client with no reader.

All being said, I'm not entirely sure how our current functionality translates into the "feature list" as defined in the current OSS listing; so I just defaulted to the list of features specified by many of the other clients for now.

lukeneil commented 3 weeks ago

@linzhou-db Given the above, do you think the list of features as proposed in the PR is OK as-is, or would you suggest any modifications before merging?

linzhou-db commented 3 weeks ago

@linzhou-db Given the above, do you think the list of features as proposed in the PR is OK as-is, or would you suggest any modifications before merging?

That is amazing to hear that you support all the rpcs in the protocol, especially the new delta format sharing. Then let's clarify so you may be able to add them in the list in the PR, Clojure also support:

lukeneil commented 3 weeks ago

@linzhou-db Thank you for the review and comprehensive feature listing - I've updated the list of supported features on the Clojure client to reflect your suggestions.