databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.75k stars 739 forks source link

Feature: Spilled ArtTree Support For Databend #14462

Open JackTan25 opened 8 months ago

JackTan25 commented 8 months ago

Summary For now in databend, there is no a good way to make sure the data unique constraint. We need to support an Index for storage engine and support feature like primary key.

Feature Target

references:

  1. https://db.in.tum.de/~leis/papers/ART.pdf
  2. "The ART of Practical Synchronization"
  3. DuckDb
JackTan25 commented 8 months ago

cc @dantengsky I think we need to support it in soon.

sundy-li commented 8 months ago

https://duckdb.org/2022/07/27/art-storage.html

Dousir9 commented 8 months ago

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

JackTan25 commented 8 months ago

well, in fact, snowflake has a unique store (https://www.snowflake.com/en/data-cloud/workloads/unistore/). They are developing it, but we can't get the source code and design details. But we can still find out some metarials like https://www.areto.de/wp-content/uploads/snowflake-unistore-Solution-Brief.pdf. By the way, I choose art-tree index for databend, because we have a good reference in open-source product and there are good metariels for us. However, this is not decided, this issue is just a temporary decision, we need to do more surveys and I'm preparing the ArtTree design details for databend.

JackTan25 commented 8 months ago

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

for now, they don't support.

Dousir9 commented 8 months ago

Is ART Tree suitable for object storage ? I think this question is very important, because Databend is not a memory or disk oriented database.

Dousir9 commented 8 months ago

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

for now, they don't support.

Snowflake already supported it, you can see: https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint?utm_source=snowscope&utm_medium=serp&utm_term=primary+key

JackTan25 commented 8 months ago

We can investigate how snowflake implement this (data unique constraint), and I have a question: is ART Tree suitable for object storage ?

for now, they don't support.

Snowflake already supported it, you can see: https://docs.snowflake.com/en/sql-reference/sql/create-table-constraint?utm_source=snowscope&utm_medium=serp&utm_term=primary+key

well, I think you can see this https://docs.snowflake.com/en/sql-reference/constraints, they just support it as definition, they don't support it.

JackTan25 commented 8 months ago

Is ART Tree suitable for object storage ? I think this question is very important, because Databend is not a memory or disk oriented database.

good question, give a initial judge:

  1. for s3 storage, it does't support update-in-place, the good news is that we can do append-only spilled ArtTree.
  2. Transaction ACID, well, we can treat it as a mutation operation.

This issue is just an initial way to solve our unique key problem, and by importing a new index, we can give optimizer more info to speed query and mutation operations. So I need to do more research and give detailed docs for this design. This maybe take a long time to do.