bytedance / bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
https://bytedance.github.io/bitsail/
Apache License 2.0
1.61k stars 329 forks source link

[Discuss][RoadMap]BitSail 2023Q1 RoadMap #275

Open lichang-bd opened 1 year ago

lichang-bd commented 1 year ago

Hi everyone. The new year is coming, Looking forward to working with you this year to build the BitSail community better and bring convenience to more data developers Here we can discuss the roadmap of BitSail in 2023Q1, Welcome to discuss and feel free to express your ideas

BitSail Connector

BitSail Basic Capacity building

BitSail Architecture Compatibility Improvement

BitSail Product Usability Optimization

BitSail Multi-Engine Architecture

zeliu commented 1 year ago

hi, I have some ideas, just for reference:

1.We usually use a batch job to initialize the table first, and then use a stream job to do incremental synchronization. Can we start only one bitsail job to switch between two jobs ? using Batch/Streaming Unification or something else?

2.At present, the reader and writer of Bitsail are one-to-one. In some requirements, it may be one-to-many. For example, a changelog contains change records of multiple tables, and the writer may be multiple hudi tables. In order to save computing resources, there are many scenarios for synchronizing many tables with one job. So I think it is necessary to support this feature.

3.columns is a mandatory parameter in the configuration of reader and writer. Can we generate it by querying metadata. In most cases, the field names of the data source and target are the same, but some field types are converted. In this way, we can start tasks through a temporary configuration file, reducing the maintenance work of a large number of configuration files.

Kick156 commented 1 year ago

hello, I have some suggestions,just for reference: about BitSail Product Usability Optimization,how about integration with streampark,it is an easy-to-use stream processing application development framework and one-stop stream processing operation platform。