StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.74k stars 1.75k forks source link

Following tabular.io tutorial. Getting error with metadata read. #24845

Closed alberttwong closed 1 year ago

alberttwong commented 1 year ago
  1. Create a warehouse in tabular.io. Also use the tabaular wizard to automatically deploy a new s3 bucket using cloudformation.
  2. Follow the https://tabular.io/blog/how-to-create-a-table-in-tabular/ to create a table and load in data.
  3. Once data is loaded then add the tabular warehouse as a external catalog.

Getting error with metadata read.

used "aws.s3.use_instance_profile" = "true" and "aws.s3.use_instance_profile" = "false",

StarRocks > show databases from tabular;
set catalog tabular;
select count(*) from `default`.movies;
+----------+
| Database |
+----------+
| default  |
| examples |
| system   |
+----------+
3 rows in set (1.85 sec)

StarRocks > set catalog tabular;
Query OK, 0 rows affected (0.00 sec)

StarRocks > select count(*) from `default`.movies;
ERROR 1064 (HY000): Failed to open input stream for file: s3://celerdata-4864/a3076908-c737-4d42-b104-581b6b170254/e3730253-692e-474d-a473-4dfb23f537e1/metadata/snap-2346046352581096739-1-3eac4786-95da-483a-8f72-2565b17f78ee.avro
StarRocks > drop catalog tabular;
Query OK, 0 rows affected (0.01 sec)
Smith-Cruise commented 1 year ago

Can you put your fe.log here?

alberttwong commented 1 year ago

duplicate of https://github.com/StarRocks/starrocks/issues/24569

alberttwong commented 1 year ago
drop catalog tabular;
create external catalog 'tabular'
PROPERTIES
(
    "header.x-tabular-s3-access" = "vended_credentials",   
    "uri" = "https://api.tabular.io/ws",
    "warehouse" = "test301",
    "type" = "iceberg",
    "iceberg.catalog.type" = "rest",
    "io-impl" = "org.apache.iceberg.aws.s3.S3FileIO",
    "credential" = "t-Zzzzzzzzzzz"
);
show databases from tabular;
set catalog tabular;
select count(*) from `default`.movies;

Then I get the AWS region issue in the default AWS profile so then I modified the container startup

ERROR 1064 (HY000): Unable to load region from any of the providers in the chain software.amazon.awssdk.regions.providers.DefaultAwsRegionProviderChain@6b27437: [software.amazon.awssdk.regions.providers.SystemSettingsRegionProvider@7da35c1a: region must not be blank or empty., software.amazon.awssdk.regions.providers.AwsProfileRegionProvider@50d2e48b: No region provided in profile: default, software.amazon.awssdk.regions.providers.InstanceProfileRegionProvider@b3fb45b: Unable to contact EC2 metadata service.]
docker run -p 9030:9030 -p 8030:8030 -p 8040:8040 -itd --name=starrocks -v starrocks-storage-be:/data/deploy/starrocks/be/storage -v starrocks-storage-fe:/data/deploy/starrocks/fe/meta -e "AWS_REGION=us-west-2"  registry.starrocks.io/starrocks/allin1-ubuntu
2023-06-08 22:36:58,673 WARN (starrocks-mysql-nio-pool-1|119) [StmtExecutor.execute():558] execute Exception, sql select count(*) from `default`.movies
software.amazon.awssdk.core.exception.SdkClientException: Received an UnknownHostException when attempting to interact with a service. See cause for the exact endpoint that is failing to resolve. If this is happening on an endpoint that previously worked, there may be a network connectivity issue or your DNS cache could be storing endpoints for too long.
        at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.awscore.interceptor.HelpfulUnknownHostExceptionInterceptor.modifyException(HelpfulUnknownHostExceptionInterceptor.java:59) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.interceptor.ExecutionInterceptorChain.modifyException(ExecutionInterceptorChain.java:199) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.ExceptionReportingUtils.runModifyException(ExceptionReportingUtils.java:54) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.ExceptionReportingUtils.reportFailureToInterceptors(ExceptionReportingUtils.java:38) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:39) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:167) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:175) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.services.s3.DefaultS3Client.headObject(DefaultS3Client.java:5219) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at org.apache.iceberg.aws.s3.BaseS3File.getObjectMetadata(BaseS3File.java:85) ~[iceberg-aws-1.1.0.jar:?]
        at org.apache.iceberg.aws.s3.S3InputFile.getLength(S3InputFile.java:75) ~[iceberg-aws-1.1.0.jar:?]
        at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:36) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:337) ~[iceberg-bundled-guava-1.1.0.jar:?]
        at org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:241) ~[iceberg-bundled-guava-1.1.0.jar:?]
        at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:148) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.BaseSnapshot.dataManifests(BaseSnapshot.java:174) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.DataTableScan.doPlanFiles(DataTableScan.java:82) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.BaseTableScan.planFiles(BaseTableScan.java:152) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.DataTableScan.planFiles(DataTableScan.java:27) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.BaseTableScan.planTasks(BaseTableScan.java:178) ~[iceberg-core-1.1.0.jar:?]
        at org.apache.iceberg.DataTableScan.planTasks(DataTableScan.java:27) ~[iceberg-core-1.1.0.jar:?]
        at com.starrocks.connector.iceberg.IcebergMetadata.getRemoteFileInfos(IcebergMetadata.java:178) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.IcebergMetadata.getRemoteFileInfos(IcebergMetadata.java:157) ~[starrocks-fe.jar:?]
        at com.starrocks.server.MetadataMgr.getRemoteFileInfos(MetadataMgr.java:183) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.cost.IcebergStatisticProvider.generateIcebergFileStats(IcebergStatisticProvider.java:80) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.cost.IcebergStatisticProvider.getTableStatistics(IcebergStatisticProvider.java:60) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.IcebergMetadata.getTableStatistics(IcebergMetadata.java:198) ~[starrocks-fe.jar:?]
        at com.starrocks.server.MetadataMgr.lambda$getTableStatistics$3(MetadataMgr.java:170) ~[starrocks-fe.jar:?]
        at java.util.Optional.map(Optional.java:265) ~[?:?]
        at com.starrocks.server.MetadataMgr.getTableStatistics(MetadataMgr.java:169) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeIcebergScanNode(StatisticsCalculator.java:284) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalIcebergScan(StatisticsCalculator.java:272) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalIcebergScan(StatisticsCalculator.java:153) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.operator.logical.LogicalIcebergScanOperator.accept(LogicalIcebergScanOperator.java:72) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.estimatorStats(StatisticsCalculator.java:169) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.task.DeriveStatsTask.execute(DeriveStatsTask.java:57) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.task.SeriallyTaskScheduler.executeTasks(SeriallyTaskScheduler.java:68) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.Optimizer.memoOptimize(Optimizer.java:461) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.Optimizer.optimizeByCost(Optimizer.java:168) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.Optimizer.optimize(Optimizer.java:110) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.createQueryPlan(StatementPlanner.java:140) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.planQuery(StatementPlanner.java:115) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:90) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:55) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:396) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:349) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:463) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:729) ~[starrocks-fe.jar:?]
        at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: software.amazon.awssdk.core.exception.SdkClientException: Unable to execute HTTP request: celerdata-4864.us-west-2.amazonaws.com
        at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:98) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.utils.RetryableStageHelper.setLastException(RetryableStageHelper.java:204) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:83) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[cloudfs-hadoop-with-dependencies-1.1.21.jar:?]
        ... 56 more
Caused by: java.net.UnknownHostException: celerdata-4864.us-west-2.amazonaws.com
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:229) ~[?:?]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:?]
        at java.net.Socket.connect(Socket.java:609) ~[?:?]
alberttwong commented 1 year ago
drop catalog tabular;
create external catalog 'tabular'
PROPERTIES
(
    "header.x-tabular-s3-access" = "vended_credentials",   
    "s3.path-style-access"="true",
    "uri" = "https://api.tabular.io/ws",
    "warehouse" = "test301",
    "type" = "iceberg",
    "iceberg.catalog.type" = "rest",
    "io-impl" = "org.apache.iceberg.aws.s3.S3FileIO",
    "credential" = "t-ZZZZ"
);
show databases from tabular;
set catalog tabular;
select count(*) from `default`.movies;
ERROR 1064 (HY000): code=403(SdkErrorType:15), message=Access Denied:file = s3://celerdata-4864/a3076908-c737-4d42-b104-581b6b170254/e3730253-692e-474d-a473-4dfb23f537e1/data/73a03975/00000-0-7e72edcc-88df-4414-888f-aa20ddf31e07-00001.parquet
alberttwong commented 1 year ago

It works when I do all this.

drop catalog tabular;
create external catalog 'tabular'
PROPERTIES
(
    "aws.s3.use_instance_profile" = "false",
    "aws.s3.access_key" = "XXXXXX",
    "aws.s3.secret_key" = "YYYYY",    
    "header.x-tabular-s3-access" = "vended_credentials",   
    "s3.path-style-access"="true",
    "uri" = "https://api.tabular.io/ws",
    "warehouse" = "test301",
    "type" = "iceberg",
    "iceberg.catalog.type" = "rest",
    "io-impl" = "org.apache.iceberg.aws.s3.S3FileIO",
    "credential" = "t-ZZZZ"
);
show databases from tabular;
set catalog tabular;
select count(*) from `default`.movies;
alberttwong commented 1 year ago

Talking with Iceberg... they said we need the S3 signing. https://github.com/apache/iceberg/tree/master/aws/src/main/java/org/apache/iceberg/aws/s3/signer

alberttwong commented 1 year ago

Issue when using iceberg quickstart.

StarRocks > select current_version();
+-------------------+
| current_version() |
+-------------------+
| 3.0.2 c833698b93  |
+-------------------+
1 row in set (0.06 sec)

StarRocks > set catalog iceberg;
ERROR 1064 (HY000): Unknown catalog 'iceberg'
StarRocks > create external catalog 'iceberg'
    -> PROPERTIES
    -> (
    ->     "aws.s3.use_instance_profile" = "true",
    ->     "aws.s3.region" = "us-east-1",
    ->     "uri" = "http://iceberg-rest:8181",
    ->     "warehouse" = "starrocks",
    ->     "type" = "iceberg",
    ->     "iceberg.catalog.type" = "rest",
    ->     "security" = "oauth2",
    ->     "session" = "admin",
    ->     "credential" = "password"
    -> );
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    10
Current database: *** NONE ***

Query OK, 0 rows affected (0.10 sec)

StarRocks > set catalog iceberg;
Query OK, 0 rows affected (0.01 sec)

StarRocks > use nyc
^[[AReading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
StarRocks > use nyc;
Database changed
StarRocks > select * from taxis;
ERROR 1064 (HY000): Failed to open input stream for file: s3://warehouse/nyc/taxis/metadata/snap-2756738665959055809-1-6c09f2b7-78f2-4f4d-9ff0-a887bae565fc.avro
Smith-Cruise commented 1 year ago

Can you try to test with tabular official's api url, not http://iceberg-rest:8181

alberttwong commented 1 year ago

It works fine with tabular. The issue is just with community apache iceberg.

alberttwong commented 1 year ago

I'm going to close this and then talk on the new github issue. https://github.com/StarRocks/starrocks/issues/25801