apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.66k stars 3.27k forks source link

[Feature] [Real time data warehouse]CDC and Materialized View #12253

Open ChenpiDog opened 2 years ago

ChenpiDog commented 2 years ago

Search before asking

Description

1)Data streaming processing can be realized only by Flink + Doris without any other tools (such as Kafka). All layers of the data warehouse are built on Doris. The data flow between the lower layer and the upper layer is Doris. The source end is Doris, and the target end is Doris. This requires Doris to provide something similar to MySQL binlog, so that Flink CDC can capture the data changes of Doris, and then realize real-time data processing. 2)The current real-time materialized view fetching only supports a single table. It seems that there is no application scenario. The fetching of materialized views should not be limited. For example, multi table joins are supported. Changes of each table can trigger the updating of materialized views. Materialized views are snapshots of underlying queries.

Use case

No response

Related issues

No response

Are you willing to submit PR?

Code of Conduct

stalary commented 2 years ago
  1. doris cdc need some design, you can submit your idea
  2. Multitable-mv has been developing
ChenpiDog commented 2 years ago

@stalary 1 and 2 are enough to implement one. If the change of 1 is too large, Doris can consider attacking 2. Make the real-time materialized view as compatible with the real table as possible. The less restrictions on the underlying query of the real-time materialized view, the better. For example, the underlying query of the real-time materialized view is based on another real-time materialized view.