4paradigm / OpenMLDB

OpenMLDB is an open-source machine learning database that provides a feature platform computing consistent features for training and inference.
https://openmldb.ai
Apache License 2.0
1.59k stars 321 forks source link

[Question][OLAP]Data Warehouse #2608

Open ChenpiDog opened 2 years ago

ChenpiDog commented 2 years ago

Can OpenMLDB be used as a real-time data warehouse? Is there any plan?

aceforeverd commented 2 years ago

@lumianph any idea?

ChenpiDog commented 2 years ago

Data is very important for machine learning, but the original data is often not easy to analyze and process, so it needs to be standardized. The data warehouse has assumed this responsibility. Most companies' data applications are built on the data warehouse. As a machine learning database, OpenMLDB can consider expanding to the data warehouse direction, such as docking with upper BI applications, real-time ETL, etc.

lumianph commented 2 years ago

Data is very important for machine learning, but the original data is often not easy to analyze and process, so it needs to be standardized. The data warehouse has assumed this responsibility. Most companies' data applications are built on the data warehouse. As a machine learning database, OpenMLDB can consider expanding to the data warehouse direction, such as docking with upper BI applications, real-time ETL, etc.

@ChenpiDog thank you for your suggestions. Yes, this looks like promising, especially like real-time ETL, seems OpenMLDB will be very competitive. Do let us know if you have any concrete suggestions.

ChenpiDog commented 2 years ago

Data is very important for machine learning, but the original data is often not easy to analyze and process, so it needs to be standardized. The data warehouse has assumed this responsibility. Most companies' data applications are built on the data warehouse. As a machine learning database, OpenMLDB can consider expanding to the data warehouse direction, such as docking with upper BI applications, real-time ETL, etc.

@ChenpiDog thank you for your suggestions. Yes, this looks like promising, especially like real-time ETL, seems OpenMLDB will be very competitive. Do let us know if you have any concrete suggestions.

@lumianph Integrated data warehouse and machine learning database: 1) MPP distributed architecture based on Shared Storage is better; 2) Compatible with Postgre or MySQL connection protocols is of great significance for embracing the data ecology; 3) Real time data processing, including real-time ETL and real-time machine learning, is very competitive.

ChenpiDog commented 2 years ago

SQL is everything. Users only need to deal with SQL, and the rest should be handed over to OpenMLDB. OpenMLDB沟通 drawio