apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.15k stars 2.14k forks source link

use tez can't write data #7990

Open BoomLee1024 opened 1 year ago

BoomLee1024 commented 1 year ago

Query engine

tez

Question

I am using Iceberg 1.3.0, Tez 0.10.2, and Hive 3.1.3. When I create an Iceberg table using the MR engine, I can insert data and view it. I can also use Tez to select and view the data. However, when I create an Iceberg table using Tez, the insert operation doesn’t throw any errors, and I can see the data in HDFS. But when I try to select the data, it doesn’t return any results. When I execute “show create table,” I noticed that the information such as ‘current-snapshot-id’=‘8543337784714735606’, ‘current-snapshot-summary’=‘{“added-data-files”:“1”,“added-records”:“1”,“added-files-size”:“404”,“changed-partition-count”:“1”,“total-records”:“3”,“total-files-size”:“1212”,“total-data-files”:“3”,“total-delete-files”:“0”,“total-position-deletes”:“0”,“total-equality-deletes”:“0”}’, and ‘current-snapshot-timestamp-ms’=‘1687932559706’ is missing. Can you please advise on how to resolve this issue?

pvary commented 1 year ago

@BoomLee1024: This is documented here: https://iceberg.apache.org/docs/latest/hive/

DML operations work only with MapReduce execution engine.

The reason behind this that Tez changes are needed to archive the required results. If you want to use Hive on Tez with Iceberg, you might want to try out Hive 4.0.0-alpha-2. It has Iceberg automatically included, and also plenty of new features around Iceberg integration.

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.