Closed TangYan-1 closed 2 months ago
My hive version is 2.1.1-cdh6.3.0
@TangYan-1: Based on the DESCRIBE TABLE
output, I guess the problem is that you have to set the catalog configuration in Hive as well as a session config before executing the query:
set iceberg.mr.catalog = hive;
@pvary Thanks for your quick reply. The query can work after adding 'set iceberg.mr.catalog = hive;'. But I found a new issue, when I use SELECT * from iceberg_table_hive_createdbyspark2 where id='1', the query fails in the below new errrors. java.lang.NoSuchMethodError: org.apache.hadoop.hive.ql.io.sarg.ConvertAstToSearchArg.create(Lorg/apache/hadoop/conf/Configuration;Lorg/apache/hadoop/hive/ql/plan/ExprNodeGenericFuncDesc;)Lorg/apache/hadoop/hive/ql/io/sarg/SearchArgument; at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:59) ~[iceberg-hive-runtime-0.11.1.jar:?] at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:349) ~[hive-exec-2.1.1-cdh6.3.0.jar:2.1.1-cdh6.3.0] Could you please share any clues?
ConvertAstToSearchArg
does not have the method with the correct parameters available. https://issues.apache.org/jira/browse/HIVE-15269 introduces this parameters and it is only available since Hive 2.2.0 😢
The Apache Iceberg tests are run with Hive 2.3.8 and Hive 3.1.2 so you can try it out with those versions, and we are also working on a CDH version.
Thanks, @pvary Seems I can't find any info about set iceberg.mr.catalog in official documentation. Could you share some more details about this configuration? such as what's the default value?
The documentation is updated for Iceberg 0.12 which is problematic since the released version is 0.11.1. I will ask around this on the dev list.
@TangYan-1: @nastra suggested on the dev list that you can find the doc like this: https://github.com/apache/iceberg/blob/0.11.x/site/docs/hive.md
I hope this helps
@pvary Thanks for your guide. I'm now trying to use iceberg with hadoop catalog. I created a iceberg table using spark job, and create an overlay in hive. When I use presto/trino to query that table, it show the below error. Do you have any ideas?
@TangYan-1: I guess that this is now a presto question. I would guess that now Presto is trying to handle the table as a HadoopTable
instead of HadoopCatalog
. So again I suspect a catalog configuration issue. Sadly I am not sure how Presto catalog handling was working. What about trying to set the following properties in the Presto session?
'iceberg.mr.catalog'='hadoop',
'iceberg.mr.catalog.hadoop.warehouse.location'='/iceberg/warehouse'
@pvary Thanks, I'll check in presto side. Here I have two more issues about iceberg.
NULLpointer exception in beeline agg query for the hadoop catalog iceberg tables. 1). create table using spark job and load data spark3-shell --jars iceberg-spark3-runtime-0.11.1.jar \ --conf spark.sql.catalog.hadoop_catalog=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.hadoop_catalog.type=hadoop \ --conf spark.sql.catalog.hadoop_catalog.warehouse=/iceberg/warehouse spark.sql("CREATE TABLE hadoop_catalog.testdb.iceberg_table_in_hdfs (report string ,reporttime string,reporttype string, reportid string , reportversion bigint, day string ) USING iceberg PARTITIONED BY (day)")
2). create overlay in hive beeline CREATE EXTERNAL TABLE testdb.iceberg_table_in_hdfs (report string ,reporttime string,reporttype string, reportid string , reportversion bigint, day string ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION '/iceberg/warehouse/testdb/iceberg_table_in_hdfs' TBLPROPERTIES ( 'iceberg.mr.catalog'='hadoop', 'iceberg.mr.catalog.hadoop.warehouse.location'='/iceberg/warehouse'); 3) . query from hive beeline (select count(*) from testdb.iceberg_table_in_hdfs) Error: java.io.IOException: java.lang.NullPointerException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
@TangYan-1:
reset
might work for you, or starting a new session. See: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=34838882FYI: I will have a busy day today, and after that I will be offline for 2 weeks. Might not be able to come back to this issue for a while.
Full stack is as below. Iceberg version: 0.11.1 Hive vesion: 2.1.1-cdh6.3.0
2021-08-06 05:27:31,014 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:298)
at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:685)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.
Thanks for your reply. @pvary
The issue is here: https://github.com/apache/iceberg/blob/apache-iceberg-0.11.1/data/src/main/java/org/apache/iceberg/data/DeleteFilter.java#L90
Maybe try selecting specific columns instead of *
?
Just a guess
I have tried count(col), but no lucky, the same error.
What about select col from testdb.iceberg_table_in_hdfs limit 10
just for testing.
I have seen count working for our CDPD versions, and also we have unit tests which are running agains 2.8, and 3.1 Hive versions. Maybe some other fixes in Hive which are missing in the CDH version?
Yes, 'select col from testdb.iceberg_table_in_hdfs limit 10' can pass. seems using agg function in select failed.
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
I've enabled iceberg.engine.hive.enabled=true in hive-site.xml.
in spark sql, create iceberg tables with hive catalog spark3-shell --jars iceberg-spark3-runtime-0.11.1.jar \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.spark_catalog.type=hive
in beeline, select * from perftest.iceberg_table_hive_createdbyspark2;
Errors: Error while compiling statement: FAILED: SemanticException Table does not exist at location: hdfs://nameservice1/user/hive/warehouse/perftest.db/iceberg_table_hive_createdbyspark2
![Uploading Screen Shot 2021-08-03 at 4.29.50 PM.png…]()