Open AwasthiSomesh opened 2 months ago
If this is a Hive4 issue, could you please try to talk to the Hive team, as the Hive4 integration is owned by them. Thanks, Peter
@pvary We are facing this issue with iceberg with hive not sure which team can help better on this .
Please suggest if know anything , we also raised this with hive team as well
@AwasthiSomesh: The issue name suggests that this problem happens with Hive4. That is why I suggested that the Apache Hive team could help you better. The Hive 4 integration is maintained by them. It is entirely possible that they could point out some issues with the Iceberg code, but they have some very specific Hive code before calling the Iceberg APIs.
@pvary thanks for ur update
looks like hive issue discussion is not available through git-hub anyone knows how to reach out hive4 team via GitHub
You should create a Jira (https://issues.apache.org/jira/projects/HIVE/issues/HIVE-25351?filter=allopenissues), or use the dev/user list to communicate. See the github readme: https://github.com/apache/hive
@pvary Thanks a lot for your quick response .
I have 2 below question could you please help me with your comments.
Q1. As mentioned in iceberg official document hive 4 supported for iceberg without any extra dependecny. https://iceberg.apache.org/docs/latest/hive/#feature-support
Is it supported with HDFS storage or we can use it with S3/Adls gen2 as well ?.
Q2. If Hive 4 is not supported for other external storage like S3/Alds gen2 then what is the other alternative for this .. do we have any other option like hive 3/2/1 with other dependency to use iceberg with hive catalog with storage S3/Adls gen2.
Could you please help here ?.
Thanks, Somesh
@pvary If Iceberg supports with ADLSgen2 then what are the configuration require to use it seamless.
@pvary /all can anyone help here ?.
@AwasthiSomesh: This should help: https://iceberg.apache.org/docs/nightly/kafka-connect/?h=adls#azure-adls-configuration-example
@pvary I am able create iceberg table using hive4 setup and able to insert data as well but when we try to read its returning empty
Now when we see in s3 location all data file are created there.
Could you please let me know is there anything else we need to do ?
@pvary we set set hive.execution.engine=mr? for insert else insert was not working with tez engine
but with mr we are not able to read any single table with hive 4.0.4. alph2
with tez we are facing below error wj=hile inserting records
Error:
.6.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$exists$34(S3AFileSystem.java:4636) ~[hadoop-aws-3.3.6.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547) ~[hadoop-common-3.3.6.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528) ~[hadoop-common-3.3.6.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449) ~[hadoop-common-3.3.6.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2480) ~[hadoop-aws-3.3.6.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2499) ~[hadoop-aws-3.3.6.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4634) ~[hadoop-aws-3.3.6.jar:?] at org.apache.tez.common.TezCommonUtils.getTezBaseStagingPath(TezCommonUtils.java:91) ~[tez-api-0.10.3.jar:0.10.3] at org.apache.tez.common.TezCommonUtils.getTezSystemStagingPath(TezCommonUtils.java:149) ~[tez-api-0.10.3.jar:0.10.3] at org.apache.tez.dag.app.DAGAppMaster.serviceInit(DAGAppMaster.java:492) ~[tez-dag-0.10.3.jar:0.10.3] at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164) ~[hadoop-common-3.3.6.jar:?] at org.apache.tez.dag.app.DAGAppMaster$9.run(DAGAppMaster.java:2644) ~[tez-dag-0.10.3.jar:0.10.3] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_342] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_342] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) ~[hadoop-common-3.3.6.jar:?] at org.apache.tez.dag.app.DAGAppMaster.initAndStartAppMaster(DAGAppMaster.java:2641) ~[tez-dag-0.10.3.jar:0.10.3] at org.apache.tez.client.LocalClient$1.run(LocalClient.java:361) ~[tez-dag-0.10.3.jar:0.10.3] ... 1 more ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. java.io.IOException: org.apache.tez.dag.api.TezUncheckedException: java.nio.file.AccessDeniedException: s3a://com.anush/opt/hive/scratch_dir/hive/_tez_session_dir/0c1896fa-2b9d-4461-9ab4-ced0fd46ef48: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)) INFO : Completed executing command(queryId=hive_20240919065346_a71fd349-e14c-4bfa-9fb7-0b1b396565e3); Time taken: 44.607 seconds Error: Error while compiling statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. java.io.IOException: org.apache.tez.dag.api.TezUncheckedException: java.nio.file.AccessDeniedException: s3a://com.anush/opt/hive/scratch_dir/hive/_tez_session_dir/0c1896fa-2b9d-4461-9ab4-ced0fd46ef48: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)) (state=08S01,code=1) 0: jdbc:hive2://localhost:10000/>
Hi all any one please help here.
@AwasthiSomesh hello. can you try apply one patch from me and try again?
@pvary please tell will do it
@AwasthiSomesh check u email.
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. java.io.IOException: org.apache.tez.dag.api.TezUncheckedException: java.nio.file.AccessDeniedException: s3a://com.anush/opt/hive/scratch_dir/hive/_tez_session_dir/0c1896fa-2b9d-4461-9ab4-ced0fd46ef48:
you don't have write permission to that path.
Tez should handle it better
if your bucket really is called "com.anush" no that S3AfS doesn't support that, amazon say "exclusively for web sites", and with good reason.
Also, that aws warning message about deprecation flags you are using a later version of the AWS SDK than any hadoop release. Your choice, but bear in mind it hasn't been qualified, and those SDKs can be fussy at times.
Apache Iceberg version
1.6.1 (latest release)
Query engine
Hive
Please describe the bug đ
I am trying to configure AWS S3 configuration with the Hadoop and Hive setup.
But while trying so we are seeing following exception :
hadoop fs -ls s3a://somesh.qa.bucket/ -:
Fatal internal error java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
To resolve this I have added hadoop-aws-3.3.6.jar and aws-java-sdk-bundle-1.12.770.jar in Hadoop classpath.
i.e is under : /usr/local/hadoop/share/hadoop/common/lib
And S3 related configurations in the core-site.xml file: under /usr/local/hadoop/etc/hadoop directory.
Now when we try hadoop fs -ls s3a://somesh.qa.bucket/
We are observing following exception :
2024-08-22 13:50:11,294 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 2024-08-22 13:50:11,376 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2024-08-22 13:50:11,376 INFO impl.MetricsSystemImpl: s3a-file-system metrics system started 2024-08-22 13:50:11,434 WARN util.VersionInfoUtils: The AWS SDK for Java 1.x entered maintenance mode starting July 31, 2024 and will reach end of support on December 31, 2025. For more information, see https://aws.amazon.com/blogs/developer/the-aws-sdk-for-java-1-x-is-in-maintenance-mode-effective-july-31-2024/ You can print where on the file system the AWS SDK for Java 1.x core runtime is located by setting the AWS_JAVA_V1_PRINT_LOCATION environment variable or aws.java.v1.printLocation system property to 'true'. This message can be disabled by setting the AWS_JAVA_V1_DISABLE_DEPRECATION_ANNOUNCEMENT environment variable or aws.java.v1.disableDeprecationAnnouncement system property to 'true'. The AWS SDK for Java 1.x is being used here: at java.lang.Thread.getStackTrace(Thread.java:1564) at com.amazonaws.util.VersionInfoUtils.printDeprecationAnnouncement(VersionInfoUtils.java:81) at com.amazonaws.util.VersionInfoUtils.(VersionInfoUtils.java:59)
at com.amazonaws.internal.EC2ResourceFetcher.(EC2ResourceFetcher.java:44)
at com.amazonaws.auth.InstanceMetadataServiceCredentialsFetcher.(InstanceMetadataServiceCredentialsFetcher.java:38)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:111)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:91)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:75)
at com.amazonaws.auth.InstanceProfileCredentialsProvider.(InstanceProfileCredentialsProvider.java:58)
at com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.initializeProvider(EC2ContainerCredentialsProviderWrapper.java:66)
at com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper.(EC2ContainerCredentialsProviderWrapper.java:55)
at org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider.(IAMInstanceCredentialsProvider.java:53)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProvider(S3AUtils.java:727)
at org.apache.hadoop.fs.s3a.S3AUtils.buildAWSProviderList(S3AUtils.java:659)
at org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:585)
at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:959)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:586)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3611)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3712)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3663)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:557)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:347)
at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:264)
at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:247)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:105)
at org.apache.hadoop.fs.shell.Command.run(Command.java:191)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:327)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:390)
ls: s3a://infa.qa.bucket/: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
2024-08-22 13:50:14,248 INFO impl.MetricsSystemImpl: Stopping s3a-file-system metrics system...
2024-08-22 13:50:14,248 INFO impl.MetricsSystemImpl: s3a-file-system metrics system stopped.
2024-08-22 13:50:14,248 INFO impl.MetricsSystemImpl: s3a-file-system metrics syst
Could you please help us to resolve this issue as soon as possible
Willingness to contribute