Closed lurnagao-dahua closed 2 months ago
@deniskuzZ: What do you think?
similar issues 2754
Hi, May I ask if you can help me check this issue? @deniskuzZ @nastra i would be very grateful if you have any response!
@lurnagao-dahua, is this a thread pool in question: https://github.com/apache/hive/commit/45867be6cb5308566e4cf16c7b4cf8081085b58c? cc @zhangbutao
is there an easy repro so we could try in-house?
@lurnagao-dahua, is this a thread pool in question: apache/hive@45867be? cc @zhangbutao
is there an easy repro so we could try in-house?
Thank you for your reply! the work pool defined in iceberg-core,The specific location is org.apache.iceberg.util.ThreadPools
As long as different users are used for querying in hive, it is very easy to reproduce
@lurnagao-dahua, what version of Hive are you using?
@lurnagao-dahua, what version of Hive are you using?
Thank you for your reply! hive 3.1.3 and I added more information in the description now.
@lurnagao-dahua, is this a thread pool in question: apache/hive@45867be? cc @zhangbutao
is there an easy repro so we could try in-house?
@deniskuzZ Haven't looked into the ugi problem yet. But https://github.com/apache/hive/commit/45867be6cb5308566e4cf16c7b4cf8081085b58c has nothing to do with this problem. it just make the thread pool size configurable; even without this change, iceberg-core will still use the thread pool when hive calls iceberg method scan.planTasks()
.
@lurnagao-dahua, is this a thread pool in question: apache/hive@45867be? cc @zhangbutao is there an easy repro so we could try in-house?
@deniskuzZ Haven't looked into the ugi problem yet. But apache/hive@45867be has nothing to do with this problem. it just make the thread pool size configurable; even without this change, iceberg-core will still use the thread pool when hive calls iceberg method
scan.planTasks()
.
https://github.com/apache/hive/commit/45867be6cb5308566e4cf16c7b4cf8081085b58c should fix the problem, pool is recreated for every scan. the same thing is proposed here
Apache Iceberg version
1.4.3
Query engine
Hive 3.1.3
Please describe the bug 🐞
1.
B user
execution queryselect * from iceberg_tb_b
.This is a simple grab will not run job. 2.A user
execution queryselect * from iceberg_tb_a
. This is a simple grab will not run job and the error log is :Caused by: org.apache.iceberg.exceptions.RuntimeIOException: Failed to open input stream for file: hdfs://hdfsHACluster/user/hive/warehouse/yc_iceberg.db/iceberg_tb_A/metadata/a72f8bf5-5d93-405b-953e-a8fed8bfa6b6-m0.avro at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:507) ... Caused by: org.apache.hadoop.security.AccessControlException: Permission denied: user=B, access=EXECUTE, inode="/user/hive/warehouse":hadoop:supergroup:drwx------
iceberg-core module has a static global thread pool
WORKER_POOL
and the specific location isorg.apache.iceberg.util.ThreadPools
DataTableScan.doPlanFiles:
Thread in worker_pool keeps its initial user information? Should we set system properties
iceberg.scan.plan-in-worker-pool=fasle
to disable worker_pool in hiveserver2?