apache / incubator-streampark

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
https://streampark.apache.org/
Apache License 2.0
3.91k stars 1.01k forks source link

[FEATURE] It is very important to support the centralized management of metadata #235

Closed Narcasserun closed 4 months ago

Narcasserun commented 3 years ago

As a stream computing platform, it is particularly important for streamx to support metadata management,I will give the design of metadata. Metadata can improve the development efficiency and let users only care about business logic. If it can support data governance at the same time, it will make streamx more powerful

1.Architecture diagram

元数据_架构

2.Key design points

Al-assad commented 3 years ago

@Narcasserun Good idea. Looking forward to your design for data governance.

Narcasserun commented 3 years ago

Metadata is already under development. I divide metadata management into two parts: data source management and metadata. I came across a decision-making problem: does data source management need to be persisted to DB?

Al-assad commented 3 years ago

hi @Narcasserun, i don't think persistent storage of meta data is necessary. The cost of loading Kafka, MySQL, Hive meta info on demand is very low. However, there are additional data consistency issues that need to be addressed for meta data persistent storage.

BruceWong96 commented 3 years ago

Hi @Narcasserun @Al-assad , i think persistent storage of meta data is necessary. Because, it is important for the data lineage and the metadata search.

zfanswer commented 1 year ago

Any update util Aug, 2023?