apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
875 stars 292 forks source link

[Improvement] Support env.sh and add HADOOP_CONF_DIR and HIVE_CONF_DIR to CLASSPATH #3222

Closed baiyangtx closed 1 month ago

baiyangtx commented 2 months ago

Why are the changes needed?

HiveConf can load the existing hive-site.xml from the classpath during initialization.

So Amoro should support adding some directories to the classpath at startup.

In addition, user-defined environment variable configuration files need to be supported.

Brief change log

How was this patch tested?

Documentation

huyuanfeng2018 commented 2 months ago

If we can export hadoop's classpath, do we still need to package the hadoop client into amoro?

czy006 commented 2 months ago

If we can export hadoop's classpath, do we still need to package the hadoop client into amoro?

If it's configured, we really don't need it. But users have to manually deploy the dependencies of environments like Hadoop and Hive. This might become an optional replacement.

baiyangtx commented 2 months ago

If we can export hadoop's classpath, do we still need to package the hadoop client into amoro?

I think we can release binary packages without Hadoop/Hive dependencies, just like spark.

image

@zhoujinsong CC