apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
917 stars 146 forks source link

Add Client Extensibility via ServiceLoader #293

Open lmccay opened 11 months ago

lmccay commented 11 months ago

Discussion within issue #13 has moved farther away from the original intent of that issue.

This issue represents the ability to extend the Source and Target Clients by using ServiceLoader to find all of the relevant clients on the classpath and filter to the name provided through the TargetClient.getTableFormatName(). This allows for being able to specify the name rather than an enum.

In the end, one would be able to extend onetable with another client by implementing the TargetClient, adding a ServiceLoader config file in a file in the jar at resources/META-INF/services/io.onetable.spi.sync.TargetClient with the client fully qualified classname

Well written article on the use of ServiceLoader and the target usecase which I think aligns well with what we have been discussing. https://pedrorijo.com/blog/java-service-loader/

There is also a pointer to a google project to automate the service config file which looks really nice. I wasn't aware of that before.

example:

##########################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##########################################################################
io.onetable.delta.DeltaClient
io.onetable.hudi.HudiTargetClient
io.onetable.iceberg.IcebergClient

The above illustrates the builtin clients.

A custom extension would like something like:

##########################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##########################################################################
my.super.custom.TableClient

This change requires some refactoring and newly created interfaces for some of the implementation classes.

lmccay commented 11 months ago

@the-other-tim-brown - I created a DRAFT pull request. This has gotten kind of large and still requires some additional testing and cleanup but is getting close. My testing is limited to the existing and new unit tests so far. Will want to try and figure out how to add a custom TargetClient to test the intended extensibility - not sure how to do that since it will be committed to the project. :)

SourceClientProviders or SourceClients will need to be a follow up as the programming model questions need more discussion, I think.

the-other-tim-brown commented 11 months ago

Ok I'll take a look.

You can try to test by setting up a another repo and depend on a local snapshot of the OneTable jars. Then you can create a lightweight jar to test with. If you want some help with this part, I can help with that since I have some local repos setup for experimental purposes like this.

lmccay commented 11 months ago

@the-other-tim-brown - that sounds interesting. Do you happen to have a write up about this? Would be a great resource for dev environments!

the-other-tim-brown commented 11 months ago

I do not have anything. I've just had to make some jars to test out things like externally generated protos for a Hudi feature I created.

Once we have this change, I can see if we can make a separate repo to demo how to setup your own source or target that is not part of the main repo. That would probably be easiest path forward for future testing as well since we can point other developers to the sample to make sure it still works with any changes.