apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
681 stars 208 forks source link

[#4022] improvement(core): Version manange for H2 database #4054

Open lw-yang opened 4 days ago

lw-yang commented 4 days ago

What changes were proposed in this pull request?

Version manange for H2 database like MySQL

Why are the changes needed?

(Please clarify why the changes are needed. For instance,

Fix: #4022

Does this PR introduce any user-facing change?

No

How was this patch tested?

ut

lw-yang commented 4 days ago

@yuqi1129 please help to review it

jerryshao commented 4 days ago

I'm curious how do you handle table schema upgrade using H2, for example if we add some new columns? If you want to create table schema automatically, you should handle schema upgrading cases, otherwise we cannot upgrade the Gravitino server based on existing H2 storage.

lw-yang commented 4 days ago

@jerryshao We use procedure to initialize data in MySQL because there is already some existing metalakes in MySQL.

However, h2 only started supporting in version 0.6.0, there is no existing metalake in the database, so there is no need for upgrade and initialization.

when upgrading from 0.6.0 to 0.7.0,we need to handle the upgrade scenario in h2

WDYT?

jerryshao commented 4 days ago

I think we should have a mechanism to handle this in code, not just defer this to 0.7, that's why I bring this issue out.

The difference compared to MySQL is that for MySQL, we do the table creation/upgrading manually out of Gravitino, so we can manually handle DB upgrading and restart the Gravitino server. But for H2, because we created tables automatically in code, so there should be an automatic DB upgrading support mechanism for H2, otherwise, once we add a new table/column, how do we in-place upgrade the Gravitino.

lw-yang commented 4 days ago

got it

lw-yang commented 2 days ago

@yuqi1129 Could you please help to check if adding a README like this is okay?

refer to Apache Hive's handling of schema upgrades for derby https://github.com/apache/hive/blob/branch-3/metastore/scripts/upgrade/derby/README

yuqi1129 commented 2 days ago

@lw-yang The following point also needs to be included in the README

  1. After 0.6.0, any changes to the H2 schema must include a DDL change SQL in the script folder with a detailed issue ID in the SQL comment.

Filename: 0.6.0-to-0.7.0-h2.sql content

# issue xxx1
ddl here

# issue xxx2 
ddl here

besides, can you please rename the file README to Gravitino_upgrade_h2 or something similar to make the word upgrade visible.

lw-yang commented 2 days ago

@lw-yang The following point also needs to be included in the README

  1. After 0.6.0, any changes to the H2 schema must include a DDL change SQL in the script folder with a detailed issue ID in the SQL comment.

Filename: 0.6.0-to-0.7.0-h2.sql content

# issue xxx1
ddl here

# issue xxx2 
ddl here

besides, can you please rename the file README to Gravitino_upgrade_h2 or something similar to make the word upgrade visible.

This README is intended for users, but the change SQL with a detailed issue ID should be a requirement for developers ?. I think it should not be placed in the README.