apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://datastrato.ai/docs/
Apache License 2.0
680 stars 208 forks source link

[EPIC] Container management improvement for integration tests #2765

Open unknowntpo opened 3 months ago

unknowntpo commented 3 months ago

Describe the proposal

Currently, the integration tests of Gravitino uses many containers, and we want to use ContainerSuit to manage them, but there are some points that we can improve.

Current design of ContainerSuit has implemented singleton, which means no matter how many tests we are running, same kind of container (e.g. Hive) will be run at most once.

Here’s a bug, ContainerSuite.java implemented a singleton, but startHiveContainer , startTrinoContainer doesn’t determine whether container has already started or not, so this might cause multiple invokation of methods like startTrinoContainer unpredicable, these methods needs to be singleton, too, and when other test case invoke these singleton method, then needs to wait until this method is done.

Another problem is, their are some test cases that haven’t been placed in ContainerSuit, e.g. MysqlContainer in AuditCatalogMysqlIT.java, we need to put them back to ContainerSuit.

And these modification might cause a problem, because they are connected to same database concurrently, they needs to be separated to different database which name is the method name of test case. To avoid wrongly hard coded method name, we may need to get class name from Class, and extract this behavior into a method.

thanks @xunliu for pointing out these problems.

Task list

mchades commented 3 months ago

make methods in ContainerSuit singleton.

Do you have time to start this first recently?

mchades commented 3 months ago

I suspect that the failure of the Hive container to start in CI is related to it being started multiple times, so I prioritized fixing this issue by #2794

You can move on to other fixes when you have time.

BTW, I think there is another CI improvement that can be considered, which is to upload the process logs in the container (such as Hive, HDFS, etc.) after a CI failure.

unknowntpo commented 3 months ago

I suspect that the failure of the Hive container to start in CI is related to it being started multiple times, so I prioritized fixing this issue by https://github.com/datastrato/gravitino/pull/2794

@mchades sorry for the late reply, ok, I'll move on to other issues.