StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9k stars 1.81k forks source link

Issue with Apache Iceberg 1.4.2 and StarRocks 3.1.5 #36287

Closed alberttwong closed 11 months ago

alberttwong commented 11 months ago

following https://iceberg.apache.org/spark-quickstart/ and https://github.com/StarRocks/starrocks/discussions/23427

docker-compose.yml

atwong@Albert-CelerData apacheiceberg % cat docker-compose.yml
version: "3"

services:
  starrocks:
    image: registry.starrocks.io/starrocks/allin1-ubuntu:latest
    hostname: starrocks-fe
    container_name: allin1-ubuntu-iceberg
    ports:
      - 8030:8030
      - 8040:8040
      - 9030:9030
    networks:
      iceberg_net:
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1

  spark-iceberg:
    image: tabulario/spark-iceberg
    container_name: spark-iceberg
    build: spark/
    networks:
      iceberg_net:
    depends_on:
      - rest
      - minio
    volumes:
      - ./warehouse:/home/iceberg/warehouse
      - ./notebooks:/home/iceberg/notebooks/notebooks
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    ports:
      - 8888:8888
      - 8080:8080
      - 10000:10000
      - 10001:10001
  rest:
    image: tabulario/iceberg-rest
    container_name: iceberg-rest
    networks:
      iceberg_net:
        aliases:
          - iceberg-rest.minio
    ports:
      - 8181:8181
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
      - CATALOG_WAREHOUSE=s3://warehouse/
      - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
      - CATALOG_S3_ENDPOINT=http://minio:9000
  minio:
    image: minio/minio
    container_name: minio
    environment:
      - MINIO_ROOT_USER=admin
      - MINIO_ROOT_PASSWORD=password
      - MINIO_DOMAIN=minio
    networks:
      iceberg_net:
        aliases:
          - warehouse.minio
    ports:
      - 9001:9001
      - 9000:9000
    command: ["server", "/data", "--console-address", ":9001"]
  mc:
    depends_on:
      - minio
    image: minio/mc
    container_name: mc
    networks:
      iceberg_net:
    environment:
      - AWS_ACCESS_KEY_ID=admin
      - AWS_SECRET_ACCESS_KEY=password
      - AWS_REGION=us-east-1
    entrypoint: >
      /bin/sh -c "
      until (/usr/bin/mc config host add minio http://minio:9000 admin password) do echo '...waiting...' && sleep 1; done;
      /usr/bin/mc rm -r --force minio/warehouse;
      /usr/bin/mc mb minio/warehouse;
      /usr/bin/mc policy set public minio/warehouse;
      tail -f /dev/null
      "
networks:
  iceberg_net:

then we run docker-compose up.

creating table and data.

atwong@Albert-CelerData ~ % docker exec -it spark-iceberg spark-sql
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/12/01 23:09:04 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/12/01 23:09:04 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Spark Web UI available at http://bd621c5ca91e:4041
Spark master: local[*], Application Id: local-1701472144880
spark-sql ()> CREATE TABLE demo.nyc.taxis
            > (
            >   vendor_id bigint,
            >   trip_id bigint,
            >   trip_distance float,
            >   fare_amount double,
            >   store_and_fwd_flag string
            > )
            > PARTITIONED BY (vendor_id);
Time taken: 0.903 seconds
spark-sql ()> INSERT INTO demo.nyc.taxis
            > VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');
Time taken: 1.762 seconds
spark-sql ()>
atwong@Albert-CelerData ~ % mysql -P9030 -h127.0.0.1 -uroot --prompt="StarRocks > "
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 10
Server version: 5.1.0 3.1.5-5d8438a

Copyright (c) 2000, 2023, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

StarRocks > create external catalog 'iceberg'
    -> PROPERTIES
    -> (
    ->   "type"="iceberg",
    ->   "iceberg.catalog.type"="rest",
    ->   "iceberg.catalog.uri"="http://iceberg-rest:8181",
    ->   "iceberg.catalog.warehouse"="starrocks",
    ->   "aws.s3.access_key"="admin",
    ->   "aws.s3.secret_key"="password",
    ->   "aws.s3.endpoint"="http://minio:9000",
    ->   "aws.s3.enable_path_style_access"="true"
    -> );
Query OK, 0 rows affected (0.09 sec)

StarRocks > show databases from iceberg;
+----------+
| Database |
+----------+
| nyc      |
+----------+
1 row in set (0.36 sec)

StarRocks > set catalog iceberg;
Query OK, 0 rows affected (0.00 sec)

StarRocks > use nyc;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
StarRocks > select * from taxis;
ERROR 1064 (HY000): null (Service: S3, Status Code: 400, Request ID: HEJY92GD290QQX6B, Extended Request ID: eP7dLfsGHTTYHmlX1iDINtltLgdpnuHg2v9YLjpigYYFPzi4KW2RWOa0wNUuf12npmiU6qDkB1E=)
StarRocks > select current_version();
+-------------------+
| current_version() |
+-------------------+
| 3.1.5-5d8438a     |
+-------------------+
1 row in set (0.07 sec)

FE error log

root@starrocks-fe:/data/deploy/starrocks/fe/log# pwd
/data/deploy/starrocks/fe/log
root@starrocks-fe:/data/deploy/starrocks/fe/log# tail -300 fe.log
2023-12-01 23:11:13,632 INFO (tablet scheduler|35) [ClusterLoadStatistic.classifyBackendByLoad():163] classify backend by load. medium: HDD, avg load score: 0.5, low/mid/high: 0/1/0
2023-12-01 23:11:14,854 INFO (starrocks-mysql-nio-pool-1|152) [MetadataMgr$QueryMetadatas.getConnectorMetadata():389] Succeed to register query level connector metadata [catalog:iceberg, queryId: ece1b006-909e-11ee-9f26-0242ac140003]
2023-12-01 23:11:14,883 INFO (starrocks-mysql-nio-pool-1|152) [SnapshotScan.planFiles():116] Scanning table iceberg.nyc.taxis snapshot 921517448090585827 created at 2023-12-01T23:09:17.600+00:00 with filter true
2023-12-01 23:11:15,610 WARN (starrocks-mysql-nio-pool-1|152) [StmtExecutor.execute():665] execute Exception, sql select * from taxis
software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 400, Request ID: B245VCPXRN9CBS8Y, Extended Request ID: EZrBnztMahJarPrzfpSyk3cI9HIBh+Ma/EqwxKCSr+Y8spth3HNtPz4pHcMOPMi8gSLxf6NmoPY=)
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:245) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:167) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:175) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56) ~[bundle-2.17.257.jar:?]
        at software.amazon.awssdk.services.s3.DefaultS3Client.headObject(DefaultS3Client.java:5438) ~[bundle-2.17.257.jar:?]
        at org.apache.iceberg.aws.s3.BaseS3File.getObjectMetadata(BaseS3File.java:85) ~[iceberg-aws-1.2.1.jar:?]
        at org.apache.iceberg.aws.s3.S3InputFile.getLength(S3InputFile.java:75) ~[iceberg-aws-1.2.1.jar:?]
        at com.starrocks.connector.iceberg.io.IcebergCachingFileIO$CachingInputFile.getLength(IcebergCachingFileIO.java:530) ~[starrocks-fe.jar:?]
        at com.starrocks.connector.iceberg.io.IcebergCachingFileIO$CachingInputFile.newStream(IcebergCachingFileIO.java:537) ~[starrocks-fe.jar:?]
        at org.apache.iceberg.avro.AvroIterable.newFileReader(AvroIterable.java:100) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:76) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.avro.AvroIterable.iterator(AvroIterable.java:36) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.relocated.com.google.common.collect.Iterables.addAll(Iterables.java:337) ~[iceberg-bundled-guava-1.2.1.jar:?]
        at org.apache.iceberg.relocated.com.google.common.collect.Lists.newLinkedList(Lists.java:241) ~[iceberg-bundled-guava-1.2.1.jar:?]
        at org.apache.iceberg.ManifestLists.read(ManifestLists.java:45) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.BaseSnapshot.cacheManifests(BaseSnapshot.java:148) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.BaseSnapshot.dataManifests(BaseSnapshot.java:174) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.DataTableScan.doPlanFiles(DataTableScan.java:92) ~[iceberg-core-1.2.1.jar:?]
        at org.apache.iceberg.SnapshotScan.planFiles(SnapshotScan.java:131) ~[iceberg-core-1.2.1.jar:?]
        at com.starrocks.connector.iceberg.IcebergMetadata.getTableStatistics(IcebergMetadata.java:357) ~[starrocks-fe.jar:?]
        at com.starrocks.server.MetadataMgr.lambda$getTableStatistics$5(MetadataMgr.java:316) ~[starrocks-fe.jar:?]
        at java.util.Optional.map(Optional.java:265) ~[?:?]
        at com.starrocks.server.MetadataMgr.getTableStatistics(MetadataMgr.java:315) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.computeIcebergScanNode(StatisticsCalculator.java:295) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalIcebergScan(StatisticsCalculator.java:283) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.visitLogicalIcebergScan(StatisticsCalculator.java:159) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.operator.logical.LogicalIcebergScanOperator.accept(LogicalIcebergScanOperator.java:73) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.statistics.StatisticsCalculator.estimatorStats(StatisticsCalculator.java:175) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.task.DeriveStatsTask.execute(DeriveStatsTask.java:57) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.task.SeriallyTaskScheduler.executeTasks(SeriallyTaskScheduler.java:69) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.Optimizer.memoOptimize(Optimizer.java:586) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.Optimizer.optimizeByCost(Optimizer.java:194) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.optimizer.Optimizer.optimize(Optimizer.java:131) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.createQueryPlan(StatementPlanner.java:142) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.planQuery(StatementPlanner.java:117) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:92) ~[starrocks-fe.jar:?]
        at com.starrocks.sql.StatementPlanner.plan(StatementPlanner.java:57) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.StmtExecutor.execute(StmtExecutor.java:432) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:363) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.dispatch(ConnectProcessor.java:477) ~[starrocks-fe.jar:?]
        at com.starrocks.qe.ConnectProcessor.processOnce(ConnectProcessor.java:753) ~[starrocks-fe.jar:?]
        at com.starrocks.mysql.nio.ReadListener.lambda$handleEvent$0(ReadListener.java:69) ~[starrocks-fe.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
        at java.lang.Thread.run(Thread.java:829) ~[?:?]
2023-12-01 23:11:15,615 INFO (starrocks-mysql-nio-pool-1|152) [MetadataMgr.removeQueryMetadata():141] Succeed to deregister query level connector metadata on query id: ece1b006-909e-11ee-9f26-0242ac140003
2023-12-01 23:11:33,347 INFO (colocate group clone checker|102) [ColocateTableBalancer.matchGroups():901] finished to match colocate group. cost: 0 ms, in lock time: 0 ms
2023-12-01 23:11:33,370 INFO (tablet checker|36) [TabletChecker.doCheck():419] finished to check tablets. isUrgent: true, unhealthy/total/added/in_sched/not_ready: 0/0/0/0/0, cost: 0 ms, in lock time: 0 ms, wait time: 0ms
2023-12-01 23:11:33,372 INFO (tablet checker|36) [TabletChecker.doCheck():419] finished to check tablets. isUrgent: false, unhealthy/total/added/in_sched/not_ready: 0/40/0/0/0, cost: 1 ms, in lock time: 1 ms, wait time: 0ms
Smith-Cruise commented 11 months ago

Try to create catalog as:

create external catalog 'iceberg'
PROPERTIES
(
    "type"="iceberg",
    "iceberg.catalog.type"="rest",
    "iceberg.catalog.uri"="http://iceberg-rest:8181",
    "iceberg.catalog.warehouse"="starrocks",
    "aws.s3.access_key"="admin",
    "aws.s3.secret_key"="password",
    "aws.s3.endpoint"="http://minio:9000",
    "aws.s3.enable_path_style_access"="true",
    "client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory"
);

Force using SR's AWS client factory in the rest catalog.

Smith-Cruise commented 11 months ago

Because in order to seamlessly support Tabular, Starrocks used Iceberg's official AWS client factory in the rest catalog, which caused the parameters of aws.s3.xxx to become invalid. So if we want aws.s3.xxx parameters to be valid, we need to force specific "client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory"

alberttwong commented 11 months ago

validated the fix.

DanRoscigno commented 11 months ago

@Smith-Cruise Can you please help figure out what to write in the docs about client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory ? I don't know what to add to https://github.com/StarRocks/starrocks/edit/main/docs/en/data_source/catalog/iceberg_catalog.md

Is this property used only for Tabular?

Smith-Cruise commented 11 months ago

@Smith-Cruise Can you please help figure out what to write in the docs about client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory ? I don't know what to add to https://github.com/StarRocks/starrocks/edit/main/docs/en/data_source/catalog/iceberg_catalog.md

Is this property used only for Tabular?

No, this property only used by iceberg self-hosted rest catalog. If you are using tabular, you don't need to add this property.

DanRoscigno commented 11 months ago

OK, I am using the REST catalog container provided by Tabular (the same as Albert was using). I will try it with and without the property.

alberttwong commented 11 months ago

that rest catalog container is provided by apache iceberg.

DanRoscigno commented 11 months ago

that rest catalog container is provided by apache iceberg.

This one @alberttwong ? image: tabulario/iceberg-rest

DanRoscigno commented 11 months ago

Here is what I am thinking for the section of the Lakehouse quickstart that describes the external catalog:

Create an external catalog

The external catalog is the configuration that allows StarRocks to operate on the Iceberg data as if it was in StarRocks databases and tables. The individual configuration properties will be detailed after the command.

CREATE EXTERNAL CATALOG 'iceberg'
PROPERTIES
(
  "type"="iceberg",
  "iceberg.catalog.type"="rest",
  "iceberg.catalog.uri"="http://iceberg-rest:8181",
  "iceberg.catalog.warehouse"="warehouse",
  "aws.s3.access_key"="admin",
  "aws.s3.secret_key"="password",
  "aws.s3.endpoint"="http://minio:9000",
  "aws.s3.enable_path_style_access"="true",
  "client.factory"="com.starrocks.connector.iceberg.IcebergAwsClientFactory"
);

PROPERTIES

Property Description
type In this example the type is iceberg. Other options include Hive, Hudi, Delta Lake, and JDBC.
iceberg.catalog.type In this example rest is used. Tabular provides the Docker image used and Tabular uses REST.
iceberg.catalog.uri The REST server endpoint.
iceberg.catalog.warehouse The identifier of the Iceberg catalog. In this case the warehouse name specified in the compose file is warehouse.
aws.s3.access_key The MinIO key. In this case the key and password are set in the compose file to admin
aws.s3.secret_key and password.
aws.s3.endpoint The MinIO endpoint.
aws.s3.enable_path_style_access When using MinIO for Object Storage this is required. MinIO expects this format http://host:port/<bucket_name>/<key_name>
client.factory By setting this property to use iceberg.IcebergAwsClientFactory the aws.s3.access_key and aws.s3.secret_key parameters are used for authentication.