apache / inlong

Apache InLong - a one-stop, full-scenario integration framework for massive data
https://inlong.apache.org/
Apache License 2.0
1.37k stars 530 forks source link

[Umbrella] Support Apache Hudi #5099

Closed dockerzhang closed 1 year ago

dockerzhang commented 2 years ago

Describe the proposal

Sort module supports Apache Hudi.

Task list

InLong Component

Other for not specified component

Are you willing to submit PR?

Code of Conduct

Jellal-HT commented 2 years ago

@dockerzhang Please assign to me, thank you!

Jellal-HT commented 2 years ago

Motivation

Sort module supports Apache Hudi. Apache Hudi is a popular streaming datalake platform. We should support Apache Hudi in sort module.

Design

The design will follow following the document Sort Plugin and Manager Plugin

  1. Extend a new Extract Node for Apache Hudi
  2. Extend a new Load Node for Apache Hudi
  3. Implement the corresponding flink connectors for Apache Hudi
  4. Extend Extract Node and Load Node in manager module for apache Hudi

    Modification

    Load Node

  5. add the new class HudiLoadNode, which inherits the LoadNode class
  6. add the Load for Hudi to JsonSubTypes in LoadNode and Node

    Extract Node

  7. add the new class HudiExtractNode, which inherits the ExtractNode class
  8. add the Extract for Hudi to JsonSubTypes in ExtractNode and Node

    Flink Connector

    Creating new file called Hudi in inlong-sort/sort-connectors, Adding new classes into the file:

(As Apache Hudi has already integrated Flink, this part will refer to the implementation of flink connector in Apache Hudi)

Manager plugin

Follow the document Manager Plugin to extend extract node and load node

yunqingmoswu commented 2 years ago

This is a good idea and the plan is all right, looking forward to your pr.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity.

dockerzhang commented 1 year ago

duplicated with #https://github.com/apache/inlong/issues/6781, close it.