datastrato / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
609 stars 193 forks source link

[#617] improvement(core): Use general 2pc to replace transaction mechanism provided by underlying storage #710

Closed yuqi1129 closed 6 months ago

yuqi1129 commented 7 months ago

What changes were proposed in this pull request?

Introducing a general 2PC transaction implementation to replace the current transaction mechanism backed by underlying storage.

Why are the changes needed?

Some KV databases may not support transactions on their own, so we should not rely on them to provide transactions.

Fix: #617

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New test class TestKvTransactionManager was added.

github-actions[bot] commented 7 months ago

Code Coverage Report

Overall Project 65.78% -0.23% :green_circle:
Files changed 93.09% :green_circle:


Module Coverage
core 75.52% -0.89% :green_circle:
Files |Module|File|Coverage|| |:-|:-|:-|:-:| |core|[Configs.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2FConfigs.java)|100%|:green_circle:| ||[FunctionUtils.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2FFunctionUtils.java)|100%|:green_circle:| ||[KvNameMappingService.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FKvNameMappingService.java)|98.73% **`-1.27%`**|:green_circle:| ||[TransactionalKvBackendImpl.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FTransactionalKvBackendImpl.java)|95% **`-5%`**|:green_circle:| ||[KvEntityStore.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FKvEntityStore.java)|94.15%|:green_circle:| ||[ValueStatusEnum.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FValueStatusEnum.java)|83.1% **`-16.9%`**|:green_circle:| ||[TransactionIdGeneratorImpl.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FTransactionIdGeneratorImpl.java)|71.15% **`-28.85%`**|:green_circle:| ||[CatalogManager.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fcatalog%2FCatalogManager.java)|61.42%|:green_circle:| ||[RocksDBKvBackend.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FRocksDBKvBackend.java)|45.74% **`-1.26%`**|:green_circle:| ||[KvRangeScan.java](https://github.com/datastrato/gravitino/blob/78a9ded8185624d4dcf8aa640a7d54ace595b40b/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgravitino%2Fstorage%2Fkv%2FKvRangeScan.java)|32.54% **`-0.53%`**|:green_circle:|
yuqi1129 commented 7 months ago

We will use a separate PR to handle the GC collection problem.

qqqttt123 commented 7 months ago

I prefer the solution of Apache Iceberg. Could we use COW mode for this issue? I just propose a simple solution. We have only two kinds of keys: entry key and meta key. Entry key is organized as below: Its key is the name of metalake. The value is the a list of keys. The keys is like a address to find the meta key. metalakeA: {add_a, add_b, add_c} add_a : {"name": catalog_a", "list": "add_d, add_e, add_f"} add_b : {"name": "catalog_b", "list": "add_k"} add_c : {"name": "catalog_c", "list": "add_i"} add_d : {"name": "schema_a": list: "add_o"} ..... When we update metalake, we modify the meta data only. Then we use the lock to use CAS method to update the entry key. Finally, we also to garbage the unless keys synchronously.

yuqi1129 commented 7 months ago

I just propose a simple solution.

This is not a simple solution, we need to completely reorganize the layout. Currently, we do not construct all entities as tree structures (this structure has been considered, but it is costly and will require several IO operations to retrieve a table entity, so we choose to abandon it).

The solution you mentioned is similar to mine, as it uses a flag (in your solution, a pointer) to indicate the visibility of data written before. I don't really see a fundamental difference between the two. If it's convenient, could you provide more details about the method you proposed?

qqqttt123 commented 7 months ago

I just propose a simple solution.

This is not a simple solution, we need to completely reorganize the layout. Currently, we do not construct all entities as tree structures (this structure has been considered, but it is costly and will require several IO operations to retrieve a table entity, so we choose to abandon it).

The solution you mentioned is similar to mine, as it uses a flag (in your solution, a pointer) to indicate the visibility of data written before. I don't really see a fundamental difference between the two. If it's convenient, could you provide more details about the method you proposed?

The biggest difference may be the design of layout.

jerryshao commented 6 months ago

Can it be reviewed ?

yuqi1129 commented 6 months ago

Can it be reviewed ?

Yes

jerryshao commented 6 months ago

You need to update the RFC to describe your new layout.

jerryshao commented 6 months ago

Please polish your code for several rounds to make it more robust.

yuqi1129 commented 6 months ago

@mchades @FANNG1 @diqiu50 Please help review this PR, thanks.

yuqi1129 commented 6 months ago

@jerryshao Please take time to review it again, thanks.

FANNG1 commented 6 months ago

Is it necessary to provide a special prefix to all kv entity keys? because the underlying kv backend may be shared with user service, to avoid potential conflict with user service keys especially in scan operation

yuqi1129 commented 6 months ago

Using thread-local variables cannot resolve thread safety issues and might lead to program errors.

I will use another PR to enhance it. I tried, but it required many changes and took time to polish.

yuqi1129 commented 6 months ago

@diqiu50 @jerryshao All have been resolved, except for this one.

Using thread-local variables cannot resolve thread safety issues and might lead to program errors.

I will use another PR to enhance it. I tried, but it required many changes and took time to polish.