bytedance / bitsail

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
https://bytedance.github.io/bitsail/
Apache License 2.0
1.63k stars 335 forks source link
big-data data-integration data-lake data-pipeline data-synchronization flink high-performance real-time

logo

English | 简体中文

Build License Join Slack Website

Introduction

BitSail is ByteDance's open source data integration engine which is based on distributed architecture and provides high performance. It supports data synchronization between multiple heterogeneous data sources, and provides global data integration solutions in batch, streaming, and incremental scenarios. At present, it serves almost all business lines in ByteDance, such as Douyin, Toutiao, etc., and synchronizes hundreds of trillions of data every day.

Official website of BitSail: https://bytedance.github.io/bitsail/

Why Do We Use BitSail

BitSail has been widely used and supports hundreds of trillions of large traffic. At the same time, it has been verified in various scenarios such as the cloud native environment of the volcano engine and the on-premises private cloud environment.

We have accumulated a lot of experience and made a number of optimizations to improve the function of data integration

BitSail Use Scenarios

Features of BitSail

Architecture of BitSail

 Source[Input Sources] -> Framework[Data Transmission] -> Sink[Output Sinks]

The data processing pipeline is as follows. First, pull the source data through Input Sources, then process it through the intermediate framework layer, and finally write the data to the target through Output Sinks

At the framework layer, we provide rich functions and take effect for all synchronization scenarios, such as dirty data collection, auto parallelism calculation, task monitoring, etc.

In data synchronization scenarios, it covers batch, streaming, and incremental data synchronization

In the Runtime layer, it supports multiple execution modes, such as yarn, local, and k8s is under development

Supported Connectors

DataSource Sub Modules Reader Writer
Assert -
ClickHouse - -
Doris -
Druid -
Elasticsearch -
Fake -
FTP/SFTP -
Hadoop -
HBase -
Hive -
Hudi -
LocalFileSystem -
JDBC MySQL
Oracle
PostgreSQL
SqlServer
Kafka -
Kudu -
LarkSheet -
MongoDB -
Print -
Redis -
RocketMQ -
SelectDB -

Documentation for Connectors.

Community Support

Slack

Join BitSail Slack channel via this link

Mailing List

Currently, BitSail community use Google Group as the mailing list provider. You need to subscribe to the mailing list before starting a conversation

Subscribe: Email to this address bitsail+subscribe@googlegroups.com

Start a conversation: Email to this address bitsail@googlegroups.com

Unsubscribe: Email to this address bitsail+unsubscribe@googlegroups.com

WeChat Group

Welcome to scan this QR code and to join the WeChat group chat.

qr

Environment Setup

Link to Environment Setup.

Deployment Guide

Link to Deployment Guide.

BitSail Configuration

Link to Configuration Guide.

Contributing Guide

Link to Contributing Guide.

Contributors

Thanks all contributors

License

Apache 2.0 License.