Closed yuqi1129 closed 6 months ago
Overall Project | 65.78% -0.23% |
:green_circle: |
---|---|---|
Files changed | 93.09% | :green_circle: |
Module | Coverage | |
---|---|---|
core | 75.52% -0.89% |
:green_circle: |
We will use a separate PR to handle the GC collection problem.
I prefer the solution of Apache Iceberg. Could we use COW mode for this issue? I just propose a simple solution. We have only two kinds of keys: entry key and meta key. Entry key is organized as below: Its key is the name of metalake. The value is the a list of keys. The keys is like a address to find the meta key. metalakeA: {add_a, add_b, add_c} add_a : {"name": catalog_a", "list": "add_d, add_e, add_f"} add_b : {"name": "catalog_b", "list": "add_k"} add_c : {"name": "catalog_c", "list": "add_i"} add_d : {"name": "schema_a": list: "add_o"} ..... When we update metalake, we modify the meta data only. Then we use the lock to use CAS method to update the entry key. Finally, we also to garbage the unless keys synchronously.
I just propose a simple solution.
This is not a simple solution, we need to completely reorganize the layout. Currently, we do not construct all entities as tree structures (this structure has been considered, but it is costly and will require several IO operations to retrieve a table entity, so we choose to abandon it).
The solution you mentioned is similar to mine, as it uses a flag (in your solution, a pointer) to indicate the visibility of data written before. I don't really see a fundamental difference between the two. If it's convenient, could you provide more details about the method you proposed?
I just propose a simple solution.
This is not a simple solution, we need to completely reorganize the layout. Currently, we do not construct all entities as tree structures (this structure has been considered, but it is costly and will require several IO operations to retrieve a table entity, so we choose to abandon it).
The solution you mentioned is similar to mine, as it uses a flag (in your solution, a pointer) to indicate the visibility of data written before. I don't really see a fundamental difference between the two. If it's convenient, could you provide more details about the method you proposed?
The biggest difference may be the design of layout.
Can it be reviewed ?
Can it be reviewed ?
Yes
You need to update the RFC to describe your new layout.
Please polish your code for several rounds to make it more robust.
@mchades @FANNG1 @diqiu50 Please help review this PR, thanks.
@jerryshao Please take time to review it again, thanks.
Is it necessary to provide a special prefix to all kv entity keys? because the underlying kv backend may be shared with user service, to avoid potential conflict with user service keys especially in scan operation
Using thread-local variables cannot resolve thread safety issues and might lead to program errors.
I will use another PR to enhance it. I tried, but it required many changes and took time to polish.
@diqiu50 @jerryshao All have been resolved, except for this one.
Using thread-local variables cannot resolve thread safety issues and might lead to program errors.
I will use another PR to enhance it. I tried, but it required many changes and took time to polish.
What changes were proposed in this pull request?
Introducing a general 2PC transaction implementation to replace the current transaction mechanism backed by underlying storage.
Why are the changes needed?
Some KV databases may not support transactions on their own, so we should not rely on them to provide transactions.
Fix: #617
Does this PR introduce any user-facing change?
No.
How was this patch tested?
New test class
TestKvTransactionManager
was added.