airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://airbyte.com
Other
14.77k stars 3.8k forks source link

[destination-iceberg] (with hive metastore) connector configuration: bucket not found issue #38756

Open AITYOUB-Abdelmoughit opened 1 month ago

AITYOUB-Abdelmoughit commented 1 month ago

Connector Name

destination-iceberg

Connector Version

0.1.6

What step the error happened?

Configuring a new connector

Relevant information

I'm actually trying to create an iceberg destination connector with the hive metastore as a catalog. Running the services in docker. with Minio as object store @ localhost on port 9000. Hive metastore on port 9083. but I got code 400 bucket not found even if it exists

Relevant log output

Here is the full log output:
`2024-05-29 12:41:56 [platform] > 2024-05-29 12:41:56 [INFO] o.a.h.h.m.HiveMetaStoreClient(open):405 - Trying to connect to metastore with URI thrift://localhost:9083
2024-05-29 12:41:56 [platform] > 2024-05-29 12:41:56 [INFO] o.a.h.h.m.HiveMetaStoreClient(open):479 - Opened a connection to metastore, current connections: 1
2024-05-29 12:41:56 [platform] > 2024-05-29 12:41:56 [INFO] o.a.h.h.m.HiveMetaStoreClient(open):532 - Connected to metastore.
2024-05-29 12:41:59 [platform] > 2024-05-29 12:41:59 ERROR i.a.i.d.i.IcebergDestination(check):54 - Exception attempting to access the Iceberg catalog: 
2024-05-29 12:41:59 [platform] > software.amazon.awssdk.services.s3.model.S3Exception: The specified bucket is not valid. (Service: S3, Status Code: 400, Request ID: 17D3F6245D7D05A4, Extended Request ID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8)
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) ~[aws-xml-protocol-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) ~[aws-xml-protocol-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85) ~[aws-xml-protocol-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43) ~[aws-xml-protocol-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95) ~[aws-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:270) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:171) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:179) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[sdk-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[aws-core-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at software.amazon.awssdk.services.s3.DefaultS3Client.putObject(DefaultS3Client.java:9321) ~[s3-2.20.18.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.aws.s3.S3OutputStream.completeUploads(S3OutputStream.java:435) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.aws.s3.S3OutputStream.close(S3OutputStream.java:269) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at java.base/sun.nio.cs.StreamEncoder.implClose(StreamEncoder.java:439) ~[?:?]
2024-05-29 12:41:59 [platform] >    at java.base/sun.nio.cs.StreamEncoder.lockedClose(StreamEncoder.java:237) ~[?:?]
2024-05-29 12:41:59 [platform] >    at java.base/sun.nio.cs.StreamEncoder.close(StreamEncoder.java:222) ~[?:?]
2024-05-29 12:41:59 [platform] >    at java.base/java.io.OutputStreamWriter.close(OutputStreamWriter.java:266) ~[?:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.TableMetadataParser.$closeResource(TableMetadataParser.java:131) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.TableMetadataParser.internalWrite(TableMetadataParser.java:131) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.TableMetadataParser.overwrite(TableMetadataParser.java:114) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:170) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadataIfRequired(BaseMetastoreTableOperations.java:160) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:185) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:135) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:199) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.catalog.Catalog.createTable(Catalog.java:75) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at org.apache.iceberg.catalog.Catalog.createTable(Catalog.java:118) ~[iceberg-spark-runtime-3.3_2.13-1.3.0.jar:?]
2024-05-29 12:41:59 [platform] >    at io.airbyte.integrations.destination.iceberg.config.catalog.IcebergCatalogConfig.check(IcebergCatalogConfig.java:50) ~[io.airbyte.airbyte-integrations.connectors-destination-iceberg.jar:?]
2024-05-29 12:41:59 [platform] >    at io.airbyte.integrations.destination.iceberg.IcebergDestination.check(IcebergDestination.java:49) [io.airbyte.airbyte-integrations.connectors-destination-iceberg.jar:?]
2024-05-29 12:41:59 [platform] >    at io.airbyte.cdk.integrations.base.IntegrationRunner.runInternal(IntegrationRunner.java:150) [airbyte-cdk-core-0.2.0.jar:?]
2024-05-29 12:41:59 [platform] >    at io.airbyte.cdk.integrations.base.IntegrationRunner.run(IntegrationRunner.java:125) [airbyte-cdk-core-0.2.0.jar:?]
2024-05-29 12:41:59 [platform] >    at io.airbyte.integrations.destination.iceberg.IcebergDestination.main(IcebergDestination.java:42) [io.airbyte.airbyte-integrations.connectors-destination-iceberg.jar:?]
2024-05-29 12:41:59 [platform] > 2024-05-29 12:41:59 [INFO] i.a.c.i.b.IntegrationRunner(runInternal):228 - Completed integration: io.airbyte.integrations.destination.iceberg.IcebergDestination
2024-05-29 12:42:00 [platform] > Check connection job received output: io.airbyte.config.StandardCheckConnectionOutput@6246206e[status=failed,message=Could not connect to the Iceberg catalog with the provided configuration. 
The specified bucket is not valid. (Service: S3, Status Code: 400, Request ID: 17D3F6245D7D05A4, Extended Request ID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8), root cause: S3Exception(The specified bucket is not valid. (Service: S3, Status Code: 400, Request ID: 17D3F6245D7D05A4, Extended Request ID: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8)),additionalProperties={}]`

Contribute

marcosmarxm commented 1 month ago

Hello @AITYOUB-Abdelmoughit ! Destination Iceberg is a vital part of our community, but it's not currently on our roadmap for updates. This means it might take some time before it gets prioritized by the Airbyte Team. However, we encourage community involvement to improve it, and your contributions are welcome! If you're interested, please reach out to me on Slack so we can discuss how you can help. Thanks for your support!