apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.09k stars 343 forks source link

[Improvement] How about modifying the interface of `HasIdentifier`? #151

Open jerryshao opened 1 year ago

jerryshao commented 1 year ago

What would you like to be improved?

Currently, the definition of Entity has a mixin interface HasIdentifier which provides the name identifier of the entity.

public interface HasIdentifier {

  /** Return the name of the entity. */
  String name();

  /** Returns the namespace of the entity. */
  default Namespace namespace() {
    return Namespace.empty();
  }

  /** Returns the name identifier of the entity. */
  default NameIdentifier nameIdentifier() {
    return NameIdentifier.of(namespace(), name());
  }
}

The problem is that namespace() could be null when fetching from the underlying storage, developers use namespace() and nameIdentifier() will get an ambiguous value.

The reason why the namespace could be null is that we don't store namespace. And why we don't store namespace is that namespace could be modified, if we store the namespace, then we need to update all the related entities if the namespace is changed.

How should we improve?

So to avoid ambiguity, we'd better modify this HasIdentifier, one possible way is to remove namespace() and nameIdentifier() interface, since we can get this from code context.

public interface HasIdentifier {

  /** Return the name of the entity. */
  String name();
}

Another way is to change the interface to set namespace implicitly to make sure it's not empty.

jerryshao commented 1 year ago

@yuqi1129 what's your opinion?

yuqi1129 commented 1 year ago

@jerryshao

Another way is to change the interface to set namespace implicitly to make sure it's not empty.

I prefer the latter. If we remove the namespace information from the interface HasIdentifier, We could not get the absolute path of an entity like '/metalake_xxx/catalog_xxxx/schema_xxx' as we only have the name of a specific entity. What we need to do is to ensure that the namespace is not null I think.

jerryshao commented 1 year ago

We could not get the absolute path of an entity like '/metalake_xxx/catalog_xxxx/schema_xxx' as we only have the name of a specific entity.

From the current code, I think we could always get the full qualified name, because our typical code is to use a NameIdentifier to get/put/update/delete an entity, and this NameIdentifier can be gotten from REST API (or others).

A deep question is: is "NameIdentifier" really a property of entity, or it is just a path reference?

jerryshao commented 1 year ago
public interface HasIdentifier<T> {

  /**
   * Bind the class that implements this interface to a namespace. With this namespace binding,
   * we could get the name identifier of the entity.
   *
   * The implementation should also check the validity of the namespace. If the namespace is not
   * valid, this method should throw an IllegalArgumentException.
   *
   * @param namespace The namespace to bind to.
   * @return the class that implements this interface.
   * @throws IllegalArgumentException if the namespace is not legal.
   */
  T bindNamespace(Namespace namespace) throws IllegalArgumentException;

  /** Returns the name identifier of the entity. */
  NameIdentifier nameIdentifier();
}

@yuqi1129 how about this change of HasIdentifier?

yuqi1129 commented 1 year ago

@jerryshao Do you mean we should remove method namespace() and add method bindNameSpace()? Or just add the method bindNameSpace().

jerryshao commented 1 year ago

Do you mean we should remove method namespace() and add method bindNameSpace()? Or just add the method bindNameSpace().

Remove the namespace(), since namespace can be gotten from NameIdentifier, not so necessary to keep it.

yuqi1129 commented 1 year ago

If we call the method nameIdentifier before binding the real namespace on the entity, Then the value of namespace in nameIdentifier is NULL ?

jerryshao commented 1 year ago

We should check and throw exception before namespace is binding when calling nameIdentifier().

yuqi1129 commented 1 year ago

how about this change of HasIdentifier?

It's OK for me.