Closed mchades closed 1 month ago
I suggest clarify the dropping semantics of catalog and metalake as follows:
When dropping a metalake:
cascade
to drop the non-empty metalake, which will then cascade drop the catalogcc @jerryshao @shaofengshi @FANNG1
Hi @mchades I think there are some points need to think:
@jerryshao Thanks for your points!
Base on above four points, I propose the new drop rule for metalake:
in-use
property to metalake with the default value of true
.in-use=false
can be dropped. in-use
property serves the same purpose)in-use=false
, all operations on the associated sub-entities of this metalake are rejected.For dropping catalog:
in-use
property to catalog with the default value of true
.in-use=false
can be dropped. in-use=false
) , it's sub-entities will also be dropped but the metadata in HMS won't. )
in-use=false
, all operations on the associated sub-entities of this catalog are rejected.@jerryshao Thanks for your points!
Base on above four points, I propose the new drop rule for metalake:
- Add a
in-use
property to metalake with the default value oftrue
.- Only metalakes with
in-use=false
can be dropped.- When a metalake is dropped, its associated sub-entities, such as catalog, user, role, tag, and metric will also be dropped together. (note: we don't need cascade here because the
in-use
property serves the same purpose)- When
in-use=false
, all operations on the associated sub-entities of this metalake are rejected.- return false if the catalog does not exist
- return true if drop successfully
For dropping catalog:
- Also add a
in-use
property to catalog with the default value oftrue
.- Only catalogs with
in-use=false
can be dropped.When a catalog is dropped, only its associated sub-entities in Gravitino store, such as schema and table, will also be dropped together.(For example, when dropping a Hive catalog(
in-use=false
) , it's sub-entities will also be dropped but the metadata in HMS won't. )
- why not drop external metadata? Because I think when we create a catalog, we just establish a connection (which can also be understood as a mapping relationship) between the external service and gravitino, so when deleting, we just need to cut off this connection (or remove this mapping relationship).
- When
in-use=false
, all operations on the associated sub-entities of this catalog are rejected.- return false if the catalog does not exist
- return true if drop successfully
Let me think a bit on this.
I have several questions:
in-use
property set by user manually, right?Can you please investigate the behavior of unity catalog, unity catalog has the same concept like metastore equals to our metalake, and catalog maps to our catalog. Besides, you'd also check starburst's gravity.
- is this
in-use
property set by user manually, right?
yes, and the privilege system should determine who can set this value.
- The default drop behavior is cascadingly drop, right?
yes, since in-use=false
, it will be dropped cascadingly. Because I can't imagine a scenario where we need to delete Metalake or Catalog and still keep their sub-entities.
- What about the managed catalog like hadoop catalog, are we going to delete everything when catalog is dropped?
It's the same behavior with other catalogs, But it should be noted that we will only delete the data in the Gravitino store, and not in the Hadoop.
Can you please investigate the behavior of unity catalog, unity catalog has the same concept like metastore equals to our metalake, and catalog maps to our catalog. Besides, you'd also check starburst's gravity.
see the investigation, and the key conclusion is that when deleting the catalog(or metalake), external service data will not be deleted.
finish design
What would you like to be improved?
metalake and catalog are completely managed by Gravitino, there are some drop behaviors that need to be clarified:
How should we improve?
answer above questions