datastrato / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
616 stars 193 forks source link

[#228] refactor:(core) Add id generator and key encoding interface #229

Closed yuqi1129 closed 10 months ago

yuqi1129 commented 10 months ago

What changes were proposed in this pull request?

Add new auto id generator and id to name mappings interfaces. for more please refer to #228

Why are the changes needed?

Fix: #228

Does this PR introduce any user-facing change?

No

How was this patch tested?

No, No real change to the real code path.

github-actions[bot] commented 10 months ago

Code Coverage Report

Overall Project 59.1% -0.29% :green_circle:
Files changed 12.77% :red_circle:


Module Coverage
core 66.18% -0.93% :red_circle:
Files |Module|File|Coverage|| |:-|:-|:-|:-:| |core|[BinaryEntityKeyEncoder.java](https://github.com/datastrato/graviton/blob/92481b80ae3f6ca90543c3d3aaa877f5ce316884/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgraviton%2Fstorage%2Fkv%2FBinaryEntityKeyEncoder.java)|99.58%|:green_circle:| ||[KvEntityStore.java](https://github.com/datastrato/graviton/blob/92481b80ae3f6ca90543c3d3aaa877f5ce316884/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgraviton%2Fstorage%2Fkv%2FKvEntityStore.java)|82.61%|:green_circle:| ||[EntityKeyEncoder.java](https://github.com/datastrato/graviton/blob/92481b80ae3f6ca90543c3d3aaa877f5ce316884/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgraviton%2Fstorage%2FEntityKeyEncoder.java)|0%|:red_circle:| ||[RandomIdGenerator.java](https://github.com/datastrato/graviton/blob/92481b80ae3f6ca90543c3d3aaa877f5ce316884/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgraviton%2Fstorage%2FRandomIdGenerator.java)|0%|:red_circle:| ||[InMemoryNameMappingService.java](https://github.com/datastrato/graviton/blob/92481b80ae3f6ca90543c3d3aaa877f5ce316884/core%2Fsrc%2Fmain%2Fjava%2Fcom%2Fdatastrato%2Fgraviton%2Fstorage%2FInMemoryNameMappingService.java)|0%|:red_circle:|
yuqi1129 commented 10 months ago

@jerryshao Please help to give your advice about this PR. Only when everything is confirmed can we continue on this issue. Related problems that puzzle me have been listed, Your suggestion is helpful and valuable.

yuqi1129 commented 10 months ago

@jerryshao Please take time to see if this PR has any problem. I would add a new PR that depends on this PR

jerryshao commented 10 months ago

As I mentioned before, I would suggest you give an overall interface design before I can tell whether it is good or not. Currently, what I can see is only the id name mapping service and id generator. I would like to see how you design EntityStore and BinaryIdentifer interfaces.

You don't have to achieve the logic, all I want to see is the code organization and interface design.

yuqi1129 commented 10 months ago

As I mentioned before, I would suggest you give an overall interface design before I can tell whether it is good or not. Currently, what I can see is only the id name mapping service and id generator. I would like to see how you design EntityStore and BinaryIdentifer interfaces.

You don't have to achieve the logic, all I want to see is the code organization and interface design.

I see.

yuqi1129 commented 10 months ago

As I mentioned before, I would suggest you give an overall interface design before I can tell whether it is good or not. Currently, what I can see is only the id name mapping service and id generator. I would like to see how you design EntityStore and BinaryIdentifer interfaces.

You don't have to achieve the logic, all I want to see is the code organization and interface design.

I have checked the code several times, Refactoring id name mapping and id generator is indeed necessary and important. As referred to EntityStore, The problem you care about is that some information has been passed to the EntityStore which makes the interface messy and not very elegant. I have some ideas as follows:

1_{metalake_id}            
2_{metalake_id}_{catalog_id}
3_{metalake_id}_{catalog_id}_{scheam_id}
4_{metalake_id}_{catalog_id}_{scheam_id}_{table_id}

Value 1, 2and other numeric values are the length of the name identifier. This kind of key-encoding solution can support point and range queries. In this way, we would not rely on EntityType for key encodings.

Moreover, I think class type information passed to interface EntityStore is also redundant if we view EntityType as redundancy

Please help to evaluate the feasibility of this change. If this is OK, I would work on this problem soon;

@jerryshao

jerryshao commented 10 months ago

Let me clarify the current requirements, basically the current implementation mixed several things together, let's break down the things:

  1. Id generator. This is used to generate unique ID for every entity.
  2. Name-id relation maintainer. This component is used to maintain the relationship between id and name,and should be transactional.
  3. NameIdentifer/Namespace to encoded key mapping component. This component is to map the name identifier to binaries and vice versa (if required).
  4. kv store. kv store should support to use the encoded key to put/get entities from the kv storage. kv store can be embedded in the entity store as one implementation.

Please think of breaking things into these components and to see how to organize the classes.

yuqi1129 commented 10 months ago

Let me clarify the current requirements, basically the current implementation mixed several things together, let's break down the things:

  1. Id generator. This is used to generate unique ID for every entity.
  2. Name-id relation maintainer. This component is used to maintain the relationship between id and name,and should be transactional.
  3. NameIdentifer/Namespace to encoded key mapping component. This component is to map the name identifier to binaries and vice versa (if required).
  4. kv store. kv store should support to use the encoded key to put/get entities from the kv storage. kv store can be embedded in the entity store as one implementation.

Please think of breaking things into these components and to see how to organize the classes.

Got it

yuqi1129 commented 10 months ago

Let me clarify the current requirements, basically the current implementation mixed several things together, let's break down the things:

  1. Id generator. This is used to generate unique ID for every entity.
  2. Name-id relation maintainer. This component is used to maintain the relationship between id and name,and should be transactional.
  3. NameIdentifer/Namespace to encoded key mapping component. This component is to map the name identifier to binaries and vice versa (if required).
  4. kv store. kv store should support to use the encoded key to put/get entities from the kv storage. kv store can be embedded in the entity store as one implementation.

Please think of breaking things into these components and to see how to organize the classes.

Hi, This PR contains the first 2 of the 4 points you mentioned above, and one already exists before and do not need big changes about it. For the last one, After the first three issues have been resolved, All we need to do is just assemble them together.

Please view it again and If you have any thoughts or ideas, please let me know. Thank you

jerryshao commented 10 months ago

I'm thinking that IdGenerator and NameMappingService are not specific to storage package, is it better to move package util? This is a minor issue, others look good to me. What's your opinion?

yuqi1129 commented 10 months ago

I'm thinking that IdGenerator and NameMappingService are not specific to storage package, is it better to move package util? This is a minor issue, others look good to me. What's your opinion?

It seems to be very weird for me to put interface in package util as package util mostly contains utility class(class that holds static functions generally). So I do not see this point is reasonable. Maybe we can move to other packages or introduce a new one.

jerryshao commented 10 months ago

package util is just an example, my point is that these two interfaces and implementations are not storage specific, what do you think?

yuqi1129 commented 10 months ago

package util is just an example, my point is that these two interfaces and implementations are not storage specific, what do you think?

These two Interfaces indeed are storage independent and can be moved to another package. But related implementation may be involved in storage. E.g. We would use a key-value backend to maintain name mappings say KvNameMappingService

jerryshao commented 10 months ago

Alright, let's put them here for now.