Closed jerryshao closed 5 months ago
I actually quite like the concept of metalake
, so I'd like to see a bit more information as to why this is being changed.
Justin, the concept of metalake still there, will not remove or change it. Just in the clients and APIs, we want to make it easier to use for users and developers.
@shaofengshi I come from this comment:https://github.com/datastrato/gravitino/pull/1700#discussion_r1515897119. Why does one company only have one metalake? Do you think we should use one metalake in different regions? That means we will use the same server or the same backend. If we setup server in each region. we have to use the same metalake name in each server.
@shaofengshi I come from this comment:#1700 (comment). Why does one company only have one metalake? Do you think we should use one metalake in different regions? That means we will use the same server or the same backend. If we setup server in each region. we have to use the same metalake name in each server.
Hi Peidian, of couse one company can have multiple metalakes; each metalake is a container for metadata grouping, management or isolation. I mean for most companies (small to medium), one metalake is enough. For big company like Xiaomi, you can create multiple.
For the second quesiton, I think yes, because the metalake is cross-region, cross-cloud, which should not be binding with a certain region. For the third question, I'm not sure I understand it well, but I guess it is related with how Gravitino servers work together as a cluster?
@shaofengshi I come from this comment:#1700 (comment). Why does one company only have one metalake? Do you think we should use one metalake in different regions? That means we will use the same server or the same backend. If we setup server in each region. we have to use the same metalake name in each server.
Hi Peidian, of couse one company can have multiple metalakes; each metalake is a container for metadata grouping, management or isolation. I mean for most companies (small to medium), one metalake is enough. For big company like Xiaomi, you can create multiple.
For the second quesiton, I think yes, because the metalake is cross-region, cross-cloud, which should not be binding with a certain region. For the third question, I'm not sure I understand it well, but I guess it is related with how Gravitino servers work together as a cluster?
@shaofengshi Thanks for your response. It helps a lot. For the third question, your understanding is right. There is a case in our company. We have some jobs that copies data between different regions. If there are multiple Gravitino servers, the copying job have to access more than one server to get the metadata. But we only need one metalake, so the different Gravitino servers will be set as the same name. So I want some advice about the best practices about the Gravitino server.
As all related issues are closed, close this issue.
Describe the proposal
Metalake
in Gravitino is a tenant concept that separates users, metadata for different groups. But now, this concept is mixed with catalog, users will always question about what is metalake, which brings high educational burden for the adoption. In the meantime, the concept of metalake is required for our designed APIs, which also brings in the inconvenience.CC @shaofengshi
Task list
So in this epic, we should rethink the concept of
metalake
, and refactor the related APIs to:metalake
, shifting from the top level of data catalog to tenant.metalake
, and make the namespace from 4 level to 3 level.Subtasks: