Closed gyli closed 2 months ago
Apache Airflow is an orchestration tool not a compute engine or ETL tool
if you need to run custom code like this jar , trigger the run with a bashoperator or a KubernetesPodOperator
and if you want a "friendly" Xtable trigger operator
than extand the BashOperator or KubernetesPodOperator
example
class EasyXTableOperator(BashOperator):
...
Hi @raphaelauv, I agree that such operator is a "friendly" XTable trigger and can be built on top of BashOperator. The example code you showed is exactly how I will build it, but my point here is it should be an Airflow community managed provider.
Just like Airflow also offers Iceberg hook and DatabricksSQLOperator in the corresponding providers, I believe XTable should also be added, as data engineering industry is embracing unified data format. For example, Microsoft Fabric and OneLake is adopting XTable. I can understand that this Apache project is still in incubating stage, and Airflow might want to hold until it's closer to an industry standard. While from a data engineer perspective, such unified datalake format is the way we are heading to, and I don't see a reason asking users to create custom operator for such work.
so you want to add a XTable hook ?
At this stage I'm only thinking about building a XTableOperator
, similar to the example operator that AWS provides for MWAA.
Read https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers about the process on how new providers are accepted here and feel free to follow it. Since this is not an issue or feature - converting that into discussion.
Description
Apache XTable translates metadata among datalakes, allowing users to read from datalake with the tools don't have native support. XTable can be executed with command like
An Airflow operator can be created to wrap this command and provide both file and dict input for those XTable config in YAML files.
Use case/motivation
AWS provides an example XTableOperator for XTable. This blog has good explanation about the Open table formats XTable provides. While this example operator is essentially an MVP version, and serves as an MWAA plugin. We can create Apache XTable provider making it available for more Airflow users, and providing more flexible user input.
Related issues
No response
Are you willing to submit a PR?
Code of Conduct