apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.09k stars 343 forks source link

[EPIC] Redefine and refactor the `metalake` concept in Gravitino #2418

Closed jerryshao closed 5 months ago

jerryshao commented 8 months ago

Describe the proposal

Metalake in Gravitino is a tenant concept that separates users, metadata for different groups. But now, this concept is mixed with catalog, users will always question about what is metalake, which brings high educational burden for the adoption. In the meantime, the concept of metalake is required for our designed APIs, which also brings in the inconvenience.

CC @shaofengshi

Task list

So in this epic, we should rethink the concept of metalake, and refactor the related APIs to:

  1. Weaken the concept of metalake, shifting from the top level of data catalog to tenant.
  2. refactor the API to simplify the concept of metalake, and make the namespace from 4 level to 3 level.

Subtasks:

justinmclean commented 8 months ago

I actually quite like the concept of metalake, so I'd like to see a bit more information as to why this is being changed.

shaofengshi commented 8 months ago

Justin, the concept of metalake still there, will not remove or change it. Just in the clients and APIs, we want to make it easier to use for users and developers.

coolderli commented 8 months ago

@shaofengshi I come from this comment:https://github.com/datastrato/gravitino/pull/1700#discussion_r1515897119. Why does one company only have one metalake? Do you think we should use one metalake in different regions? That means we will use the same server or the same backend. If we setup server in each region. we have to use the same metalake name in each server.

shaofengshi commented 8 months ago

@shaofengshi I come from this comment:#1700 (comment). Why does one company only have one metalake? Do you think we should use one metalake in different regions? That means we will use the same server or the same backend. If we setup server in each region. we have to use the same metalake name in each server.

Hi Peidian, of couse one company can have multiple metalakes; each metalake is a container for metadata grouping, management or isolation. I mean for most companies (small to medium), one metalake is enough. For big company like Xiaomi, you can create multiple.

For the second quesiton, I think yes, because the metalake is cross-region, cross-cloud, which should not be binding with a certain region. For the third question, I'm not sure I understand it well, but I guess it is related with how Gravitino servers work together as a cluster?

coolderli commented 8 months ago

@shaofengshi I come from this comment:#1700 (comment). Why does one company only have one metalake? Do you think we should use one metalake in different regions? That means we will use the same server or the same backend. If we setup server in each region. we have to use the same metalake name in each server.

Hi Peidian, of couse one company can have multiple metalakes; each metalake is a container for metadata grouping, management or isolation. I mean for most companies (small to medium), one metalake is enough. For big company like Xiaomi, you can create multiple.

For the second quesiton, I think yes, because the metalake is cross-region, cross-cloud, which should not be binding with a certain region. For the third question, I'm not sure I understand it well, but I guess it is related with how Gravitino servers work together as a cluster?

@shaofengshi Thanks for your response. It helps a lot. For the third question, your understanding is right. There is a case in our company. We have some jobs that copies data between different regions. If there are multiple Gravitino servers, the copying job have to access more than one server to get the metadata. But we only need one metalake, so the different Gravitino servers will be set as the same name. So I want some advice about the best practices about the Gravitino server.

shaofengshi commented 5 months ago

As all related issues are closed, close this issue.