cloudcheflabs / iceberg-rest-catalog

10 stars 5 forks source link

Server error: RuntimeIOException: Failed to get file system for path: s3://warehouse/db/table/metadata/00000-8c5d0bdc-8cb4-428e-9c14-b69515f50c1e.metadata.json #2

Open ming12713 opened 1 year ago

ming12713 commented 1 year ago

hi, i follow https://itnext.io/easy-way-to-move-iceberg-data-using-iceberg-rest-catalog-8fb74e630a43 install rest server and use spark sql create table ,encountered the following problem. Could you please help me take a look?

Screenshot from 2023-05-25 13-27-43

rest server conf

restCatalog:
  token: "123456"
  warehouse: "s3://warehouse/"
  s3: 
    accessKey: "minio"
    secretKey: "minio"
    endpoint: "http://10.0.162.26:39999/api/v1/s3"
  jdbc:
    url: "jdbc:mysql://mysql:3306/rest_catalog?useSSL=false&createDatabaseIfNotExist=true"
    user: "root"
    password: "4XVQRr9h0L"

spark configuration

# spark hadoop fs
spark.hadoop.fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3a.impl  org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.endpoint.region cn-east-1
spark.hadoop.fs.s3a.access.key  minio
spark.hadoop.fs.s3a.secret.key  minio
spark.hadoop.fs.s3a.endpoint    http://10.0.162.26:39999/api/v1/s3
spark.hadoop.fs.s3a.bucket.probe  0
spark.hadoop.fs.s3a.change.detection.version.required  false
spark.hadoop.fs.s3a.change.detection.mode  none
spark.hadoop.fs.s3a.path.style.access  true
# spark rest
spark.sql.extensions               org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.rest             org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.rest.catalog-impl    org.apache.iceberg.rest.RESTCatalog
spark.sql.catalog.rest.io-impl    org.apache.iceberg.aws.s3.S3FileIO
spark.sql.catalog.rest.uri        http://10.0.162.5:30299
spark.sql.catalog.rest.warehouse   s3://warehouse/
spark.sql.catalog.rest.token       123456
spark.sql.catalog.rest.s3.endpoint    http://10.0.162.26:39999/api/v1/s3
spark.sql.catalog.rest.s3.path-style-access    true
spark.sql.defaultCatalog     rest
cloudcheflabs commented 1 year ago

Hi,

S3 credentials used in rest catalog server seems to be not same to that used in spark hadoop configuration.

...

s3: accessKey: "minio" secretKey: "minio"

...

spark.hadoop.fs.s3a.access.key root spark.hadoop.fs.s3a.secret.key root

...

Please, make sure s3 credentials are correct.

2023년 5월 25일 (목) 오후 2:41, iiiiijjjjj @.***>님이 작성:

hi, i follow https://itnext.io/easy-way-to-move-iceberg-data-using-iceberg-rest-catalog-8fb74e630a43 http://url install rest server and use spark sql create table ,encountered the following problem. Could you please help me take a look?

[image: Screenshot from 2023-05-25 13-27-43] https://user-images.githubusercontent.com/5196113/240811201-fca73ca5-a4bf-4783-b9db-fc9a5b0caf37.png rest server conf

restCatalog: token: "123456" warehouse: "s3://warehouse/" s3: accessKey: "minio" secretKey: "minio" endpoint: "http://10.0.162.26:39999/api/v1/s3" jdbc: url: "jdbc:mysql://mysql:3306/rest_catalog?useSSL=false&createDatabaseIfNotExist=true" user: "root" password: "4XVQRr9h0L"

spark configuration

spark hadoop fs

spark.hadoop.fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem spark.hadoop.fs.s3a.endpoint.region cn-east-1 spark.hadoop.fs.s3a.access.key root spark.hadoop.fs.s3a.secret.key root spark.hadoop.fs.s3a.endpoint http://10.0.162.26:39999/api/v1/s3 spark.hadoop.fs.s3a.bucket.probe 0 spark.hadoop.fs.s3a.change.detection.version.required false spark.hadoop.fs.s3a.change.detection.mode none spark.hadoop.fs.s3a.path.style.access true

spark rest

spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.catalog.rest org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.rest.catalog-impl org.apache.iceberg.rest.RESTCatalog spark.sql.catalog.rest.io-impl org.apache.iceberg.aws.s3.S3FileIO spark.sql.catalog.rest.uri http://10.0.162.5:30299 spark.sql.catalog.rest.warehouse s3://warehouse/ spark.sql.catalog.rest.token 123456 spark.sql.catalog.rest.s3.endpoint http://10.0.162.26:39999/api/v1/s3 spark.sql.catalog.rest.s3.path-style-access true spark.sql.defaultCatalog rest

— Reply to this email directly, view it on GitHub https://github.com/cloudcheflabs/iceberg-rest-catalog/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/ARVQEW7DBZJP3JYGRNHXB6TXH3WG7ANCNFSM6AAAAAAYOKIKZQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ming12713 commented 1 year ago

yes , s3 ak/sk is correct,The previous statement was modified for security reasons

cloudcheflabs commented 1 year ago

You are using minio which I have not tested for rest catalog server.

I think, maybe, s3 region property of fs.s3a.endpoint.region needs to be configured to rest catalog server and spark hadoop configuration. For which, I have created an issue:

For s3 region property fs.s3a.endpoint.region, you need dependencies of spark 3.4.0, hadoop 3.3.4, and iceberg 1.2.1. I have already used such functionality to set s3 region property for other project.

At the moment, current rest catalog server does not support adding s3 region property fs.s3a.endpoint.region.

ming12713 commented 1 year ago

my env

spark 3.3.2 
iceberg 1.2.1 

Not use hadoop ,spark setup with local mode to connect rest catalog server maybe related to the ""spark.sql.catalog.rest.s3.path-style-access", "true";" parameter, I noticed that you have configured this parameter in spark.sql.catalog. Does it take effect?

cloudcheflabs commented 1 year ago

I have upgraded rest catalog server with deps of iceberg 1.3.0(iceberg-spark-runtime-3.4_xxx, etc), hadoop 3.3.4 and spark 3.4.0. Please pull latest sources.

I have installed minio locally, and I have run rest catalog server like this:

# run rest catalog.
cd rest-catalog;

export REST_CATALOG_ACCESS_TOKEN=restCatalogTestToken;
export CATALOG_WAREHOUSE=s3a://mykidong/warehouse;
export S3_ACCESS_KEY=xxx;
export S3_SECRET_KEY=xxx;
export S3_ENDPOINT=http://localhost:9000;
export JDBC_URL=jdbc:mysql://localhost:3306/rest_catalog?useSSL=false\&createDatabaseIfNotExist=true;
export JDBC_USER=xxx;
export JDBC_PASSWORD=xxx;

mvn -e spring-boot:run \
-Dspring.profiles.active=dev \
;

and I have run test case of spark job like this:

# run spark job.
cd spark;

mvn -e -Dtest=RunSparkWithIcebergRestCatalog \
-Ds3AccessKey=xxx \
-Ds3SecretKey=xxx \
-Ds3Endpoint=http://localhost:9000 \
-DrestUrl=http://localhost:8181 \
-Dwarehouse=s3a://mykidong/warehouse \
-Dtoken=restCatalogTestToken \
test;

This spark job works fine.

You may update your deps like iceberg 1.3.0(iceberg-spark-runtime-3.4_xxx, etc), hadoop 3.3.4 and spark 3.4.0 in your spark env.

ming12713 commented 1 year ago

I have upgraded rest catalog server with deps of iceberg 1.3.0(iceberg-spark-runtime-3.4_xxx, etc), hadoop 3.3.4 and spark 3.4.0. Please pull latest sources.

I have installed minio locally, and I have run rest catalog server like this:

# run rest catalog.
cd rest-catalog;

export REST_CATALOG_ACCESS_TOKEN=restCatalogTestToken;
export CATALOG_WAREHOUSE=s3a://mykidong/warehouse;
export S3_ACCESS_KEY=xxx;
export S3_SECRET_KEY=xxx;
export S3_ENDPOINT=http://localhost:9000;
export JDBC_URL=jdbc:mysql://localhost:3306/rest_catalog?useSSL=false\&createDatabaseIfNotExist=true;
export JDBC_USER=xxx;
export JDBC_PASSWORD=xxx;

mvn -e spring-boot:run \
-Dspring.profiles.active=dev \
;

and I have run test case of spark job like this:

# run spark job.
cd spark;

mvn -e -Dtest=RunSparkWithIcebergRestCatalog \
-Ds3AccessKey=xxx \
-Ds3SecretKey=xxx \
-Ds3Endpoint=http://localhost:9000 \
-DrestUrl=http://localhost:8181 \
-Dwarehouse=s3a://mykidong/warehouse \
-Dtoken=restCatalogTestToken \
test;

This spark job works fine.

You may update your deps like iceberg 1.3.0(iceberg-spark-runtime-3.4_xxx, etc), hadoop 3.3.4 and spark 3.4.0 in your spark env.

nice ,i will try again