Sunt-ing / database-system-readings

:yum: A curated reading list about database systems
MIT License
466 stars 31 forks source link

Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics #74

Closed Sunt-ing closed 2 years ago

Sunt-ing commented 2 years ago

Link: http://cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf Recommend reason: "Lakehouse" is popular these days, and here is where the "Lakehouse" concept comes from. Once I thought the concept "lakehouse" is proposed by Alibaba Cloud, but actually, it is not.

Key idea: use S3 to implement data warehouse

Key features of Lakehouse:

Benefits for AI: Why use a lakehouse instead of a data lake for AI? A lakehouse gives you data versioning, governance, security and ACID properties that are needed even for unstructured data.

Databricks answers some questions about Lakehouse: Databricks Sets Official Data Warehousing Performance Record

Reference: Blog post by Databricks: https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html