apache / seatunnel

SeaTunnel is a next-generation super high-performance, distributed, massive data integration tool.
https://seatunnel.apache.org/
Apache License 2.0
7.59k stars 1.67k forks source link

[Roadmap][Volunteer-Wanted] SeaTunnel 2.0 Roadmap Task #720

Open CalvinKirs opened 2 years ago

CalvinKirs commented 2 years ago

Hello community, we have recently made a 2.0 roadmap, welcome everyone to discuss and supplement.

Update some latest messages: Hi guys, this is the roadmap draft according to the mindmap, for the following 3 months, please feel free to share your ideas and welcome to contribute to SeaTunnel and join the community.

Roadmap for 12/2021 ~ 03/2022:

xleoken commented 2 years ago

The roadmap 2.0 is excellent 👍.

I want to add a new feature to roadmap, it is very important for data integration. We use flume long long ago, but it's not friendly for us to write the agent file. Then we use flinksql replace the flume, it have provided great convenience to insert、etl data, etc.

project weakness
flume 1) need to learn how to write agent file 2) can not work with resource manager like yarn, k8s
sqoop 1) don't have rich connectors 2) based on mapreduce

flinksql

The origin workflow

env {
  execution.parallelism = 1
}

source {
    FakeSourceStream {
      result_table_name = "fake"
      field_name = "name,age"
    }
}

transform {
    sql {
      sql = "select name,age from fake"
    }
}

sink {
  ConsoleSink {}
}

The feature workflow

CREATE TABLE fake_source (
  name string,
  age int
) with (
  'connector.type' = 'fakestream',
  'format.type' = 'json'
);

CREATE TABLE print_sink (
  name string,
  age int
) with (
  'connector.type' = 'print',
);

INSERT INTO print_sink
SELECT * FROM fake_source;
wntp commented 2 years ago

The roadmap 2.0 is excellent 👍. I wanted After Process and Per Process of source/sink/transform is also in, Let's do it together, come on!

charlesy6 commented 2 years ago

The roadmap 2.0 is excellent 👍.

I have a suggestion, it seems we can focus on one type underline engine. It will be mord hard to maintain flink or spark in further.

kalencaya commented 2 years ago

The roadmap 2.0 is excellent 👍. Configurable is programmable, why not provide useful config dsl assemble source、transform、sink?

chenhu commented 2 years ago

The roadmap 2.0 is excellent 👍 The new configuration is easier to accept .

chenhu commented 2 years ago

We could add the scheduler info like this : CREATE TABLE fake_source ( name string, age int ) with ( 'connector.type' = 'fakestream', 'format.type' = 'json' 'scheduler.cron' = ' *' );

CREATE TABLE print_sink ( name string, age int ) with ( 'connector.type' = 'print', );

INSERT INTO print_sink SELECT * FROM fake_source;

wolfboys commented 2 years ago

I want to contribute to the Core Framework and Plugin Framework part, I am interested in Flink Backend Integration and Plugin Framework, I have relevant implementation experience.

davidzollo commented 2 years ago

I want to contribute to the Core Framework and Plugin Framework part, I am interested in Flink Backend Integration and Plugin Framework, I have relevant implementation experience.

good job, Do you want to implement the entire parts of Flink Backend Integration and Plugin Framework?

davidzollo commented 2 years ago

@garyelephant good news. great work

davidzollo commented 2 years ago

Everyone is welcome to share your opinions and suggestions if you have any suggestions about the roadmap, welcome to join the open source community, thx

wuchunfu commented 2 years ago

I want to contribute to the plugins part, I am interested in plugins.

simon824 commented 2 years ago

I can contribute Project Structure,Plugins,ability to define variables

davidzollo commented 2 years ago

@wolfboys @wuchunfu @simon824 good, I have updated to the related info to the content, thx please take a look, Roadmap Tasks

wntp commented 2 years ago

@garyelephant @dailidong hi,I want to contribute to the “Multi-Version Plugin Dependency with Java ClassLoader” part

davidzollo commented 2 years ago

@garyelephant @dailidong hi,I want to contribute to the “Multi-Version Plugin Dependency with Java ClassLoader” part

done, thx

wolfboys commented 2 years ago

@wolfboys @wuchunfu @simon824 good, I have updated to the related info to the content, thx please take a look, Roadmap Tasks

👌🏻

davidzollo commented 2 years ago

hi all, We plan to start a community meeting on December 18 (UTC +8). We have established a list of topics to be discussed on google docs. Welcome everyone to list the issues you want to discuss in advance. The content of the topics is arbitrary, but you need to create ISSUE in advance, follow-up we will synchronize the relevant discussion process and results to ISSUE, and finally we will categorize it on the wiki, thx

https://docs.google.com/document/d/1bXyZGt0g8f8_-d9Oj7oMx9BdWSjf8oJYZgNmoEooj8I/edit?usp=sharing

davidzollo commented 2 years ago

Hi guys, this is the roadmap draft according to the mindmap, for the following 3 months, please feel free to share your ideas and welcome to contribute to SeaTunnel and join the community.

Roadmap for 12/2021 ~ 03/2022:

  • [ ] Official website construction
  • [ ] Project Structure @simon824

    • [ ] Refine module, codes structure

    • [ ] move all plugin modules to plugins/ submodule

  • [ ] Core Framework

    • [x] Flink Backend Integration

    • [ ] support latest version of Flink 1.14

    • [ ] refine the codes to support integrate with Flink Connector with Zero coding

    • [ ] Spark Backend Integration

    • [ ] support latest version of Spark 3.2.0

    • [ ] refine the codes to support integrate with Spark Connector with Zero coding

    • [x] Plugin Framework

    • [ ] refine plugin Framework related codes

    • [x] Multi-Version Plugin Dependency with Java ClassLoader @wntp

    • [ ] Do not package in a assembly jar (for multi-version plugins) to avoid Plugin Dependency override @CalvinKirs

    • [ ] support CDC in framework

  • [ ] Configuration Management Framework

    • [ ] replace typesafe config as pom dependency, other than copying codes to SeaTunnel codebase @CalvinKirs
    • [x] ability to define variables @simon824
    • [ ] consider to support SQL as another configuration DSL.
  • [ ] Code Quality & Developer Cooperation

    • [x] add CI to SeaTunnel codebase on github
    • [x] add codestyle check
  • [ ] plugins @wuchunfu @wolfboys @simon824

    • [ ] add more sources: Hive, Clickhouse, Elasticsearch, MongDB(CDC), Kafka, HDFS
    • [ ] add more sink: Hive, Clickhouse, Elasticsearch, Kafka, HDFS
    • [ ] add more transform: ?
  • [ ] Installation & Deployment @kezhenxu94

    • [ ] refine start-seatunnel.sh related codes to give more convenience for SeaTunnel job management(install, deploy, undeploy..)
    • [ ] support install and try SeaTunnel by docker container.
  • [ ] Change Scala to Java as the main program language

hi gary, I will update these contents to the first comment to make more users easily to see the Roadmap draft

calvinjiang commented 2 years ago

I'm mainly interested in the Flink Backend Integration. I'd like to contribute more features for this.

kezhenxu94 commented 2 years ago

This one is finished in #815

davidzollo commented 2 years ago

I'm mainly interested in the Flink Backend Integration. I'd like to contribute more features for this.

good job, please create an related issue first

xbkaishui commented 2 years ago

The roadmap 2.0 is excellent 👍.

I want to add a new feature to roadmap, it is very important for data integration. We use flume long long ago, but it's not friendly for us to write the agent file. Then we use flinksql replace the flume, it have provided great convenience to insert、etl data, etc.

project weakness flume 1) need to learn how to write agent file 2) can not work with resource manager like yarn, k8s sqoop 1) don't have rich connectors 2) based on mapreduce

flinksql

  • has rich connectors, cdc
  • friendly for us, we can use sql grammar easily, like +, -, etc
  • easy to work with resource manager, yarn, k8s

This point of this feature is introduce a new workflow template, show you the workflow demo.

The origin workflow

env {
  execution.parallelism = 1
}

source {
    FakeSourceStream {
      result_table_name = "fake"
      field_name = "name,age"
    }
}

transform {
    sql {
      sql = "select name,age from fake"
    }
}

sink {
  ConsoleSink {}
}

The feature workflow

CREATE TABLE fake_source (
  name string,
  age int
) with (
  'connector.type' = 'fakestream',
  'format.type' = 'json'
);

CREATE TABLE print_sink (
  name string,
  age int
) with (
  'connector.type' = 'print',
);

INSERT INTO print_sink
SELECT * FROM fake_source;

that's awesome, sql is common language for use.

davidzollo commented 2 years ago

if anybody who want to do some contributions, please leave a message.

yuangjiang commented 2 years ago

the roadmap 2.0 It is recommended to disassemble flink into two different execution modes. From the current test, the flink datastream api and table api cannot be unified. Then we need to support the sql connector that can be submitted to the community. The plug-ins that have been developed by themselves are not two Different modes of execution

yuangjiang commented 2 years ago

It is recommended to support the submission of spark and flink scripts. The plugin can be used to submit the extended data of spark hive.

BruceWong96 commented 2 years ago

I want to contribute to the Plugin Framework part, I am interested in Flink Backend Integration and Plugin Framework, I have relevant implementation experience.

Yves-yuan commented 2 years ago

The roadmap 2.0 is excellent 👍 I'm interested in [Flink Backend Integration],[Configuration Management Framework] and [plugins] modules,and i will try to make some contributions.