Open ochanism opened 5 months ago
Hey @ochanism
Thanks for reaching out. Hive 4.x supports Iceberg out of the box. Before an external Iceberg dependency was needed, but Hive 4+ ships with Iceberg directly. So the following should work:
create external table tbl_ice stored by iceberg tblproperties ('format-version'='2') as
select * from source;
@Fokko Sorry for my ambiguous question.
I'm using Trino as a query engine with hive-metastore catalog.
And for the data ingestion (streaming), I developed a JAVA server with iceberg 1.5.2 API.
To eliminate the hive lock, I updated hive-metastore from 3.1.3 to 4.0.0.
And set the iceberg.engine.hive.lock-enabled=false
for hive catalog property (HiveCatalog class).
My JAVA server still has this dependency: org.apache.hive:hive-metastore:3.1.3
.
So I wonder if this setup is OK. (Is there could be any error due to hive-metastore version mismatch? client-library (3.1.3), real-server (4.0.0))
@ochanism Thanks for clearing that up, that helps. Can you share the compilation error that you're seeing?
@Fokko This error occurred while initializing hive catalog.
var catalog = new HiveCatalog();
catalog.initialize(this.catalogName, this.properties);
Caused by: java.lang.NoSuchFieldError: Class org.apache.hadoop.hive.conf.HiveConf$ConfVars does not have member field 'org.apache.hadoop.hive.conf.HiveConf$ConfVars METASTOREURIS'
at org.apache.iceberg.hive.HiveCatalog.initialize(HiveCatalog.java:95)
# dependencies
org.apache.iceberg:iceberg-hive-metastore:1.5.2
org.apache.hive:hive-metastore:4.0.0
I see, the property has been updated since Hive 4: https://github.com/apache/hive/commit/b33b3d3454cc9c65a1879c68679f33f207f21c0e#diff-b7bbe8545a21ec7d7e9cfe40ef66444789e332996aaa9e7f1430dbe4822a2c9cR270
They suggest using the shaded dependency: https://github.com/apache/hive/pull/4919#issuecomment-2085197509
Thanks for the information. Do you mean that Hive 4.0 with Iceberg is managed by Hive community? I want to use the latest Iceberg version, but the shaded jar used Iceberg 1.4.3. Is there any plan to update Iceberg library to support hive-metastore 4.0 catalog without the shaded jar?
@ochanism The problem is that Hive is both a query engine and a metastore (catalog in Iceberg). The maintenance of the query engine (the support to read and write Iceberg), is now covered by the Hive community as of Hive 4. The catalog is still in the codebase of Iceberg, and will probably migrate at some point to Hive 4 as well. But I think that will take some time.
There is also another discussion going on in parallel. Since Iceberg has its own catalog (REST Catalog), it might be that the REST catalog becomes the preferred catalog, and the other ones become deprecated at some point. You could easily support a Hive catalog behind a REST catalog interface. Or even better, provide a native REST catalog interface by Hive itself (https://github.com/apache/hive/pull/5145).
@ochanism: If you are willing to take some risks, you might be able to create your own catalog implementation based on https://github.com/apache/hive/blob/master/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveCatalog.java and the current Iceberg HiveCatalog implementation. It will not be supported by any of the communities, but the code changes could be simple, like changing
if (properties.containsKey(CatalogProperties.URI)) {
this.conf.set(HiveConf.ConfVars.METASTORE_URIS.varname, properties.get(CatalogProperties.URI));
}
to
if (properties.containsKey(CatalogProperties.URI)) {
this.conf.set(HiveConf.ConfVars.METASTOREURIS.varname, properties.get(CatalogProperties.URI));
}
notice the missing _
@Fokko Thanks for your kind explanation. I understood the current situation. And the plan for unifying catalogs with the REST catalog looks amazing. I hope that it will be available soon.
@pvary Thanks for your suggestion. I will try it and leave the result here after verifying it.
@pvary I tried it, but many classes were in private or default scopes. So I had to copy so many class files to modify it. I decided to move REST with the JDBC catalog according to the @Fokko opinion (REST will be the preferred catalog in the future.). Thanks for helping me guys!
HIVE-26882 and HIVE-28121 have been landed in Hive 2.3.10, though Hive 2.3 is EOL, this version is adopted widely, e.g. by Spark, and Flink.
Query engine
No response
Question
https://iceberg.apache.org/docs/1.5.2/configuration/#hadoop-configuration
I've been implementing a data ingester with Apache Iceberg 1.5.2 JAVA API. I faced a garbage hive lock issue with a hive-metastore catalog. I'm going to try to disable the hive lock according to the document as shown in the above screenshot. So I deployed a hive-metastore 4.0.0 server and tried to update catalog configs and dependencies.
But iceberg-hive-metastore:1.5.2 couldn't be compiled with hive-metastore:4.0.0. (only worked with 3.1.3) I confirmed that the data ingester worked with the above dependencies (3.1.3) with hive-metastore 4.0.0 server. I wonder if this setup is OK. Or could be there some issues??