Open animer3009 opened 1 year ago
It's actually very simple to do, even for yourself if you need it right now: just add implementation "org.apache.iceberg:iceberg-gcp:${icebergVersion}"
to build.gradle
and build the image yourself.
Hi @CrawX , Thank for your replay. What about environment variables?
For s3 we have:
environment: - AWS_ACCESS_KEY_ID=admin - AWS_SECRET_ACCESS_KEY=password - AWS_REGION=us-east-1 - CATALOG_WAREHOUSE=s3://warehouse/ - CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO - CATALOG_S3_ENDPOINT=http://minio:9000
P.S. What exact command I need to use to build? Just gradle build?
Hi @CrawX , Looks like you missed my replay. Can you help please? :)
I just added the mentioned dependency in build.gradle
and then rebuild the image using docker build
. You can check the Dockerfile
on how this project is build to do that outside of docker.
I'm using it locally with fake-gcs-server, this is the env I'm setting
- CATALOG_WAREHOUSE=gs://warehouse/
- CATALOG_IO__IMPL=org.apache.iceberg.gcp.gcs.GCSFileIO
- CATALOG_GCS_SERVICE_HOST=http://gcs:4443
If you're actually using gcs, it will probably be different (auth etc). I suggest taking a look at GCPProperties.java.
Hi @CrawX , Thank you for your help! I did all stuff, seems it works because I am able create tables. But I have trouble with storing data/read from it. Getting error like:
scala> spark.sql("INSERT INTO prod.db.sample VALUES (1, 'John'), (2, 'Jane')") 23/07/26 23:48:48 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 2) org.apache.iceberg.exceptions.RuntimeIOException: Failed to get file system for path: gs://warehouse-iceberg/prod/db/sample/data/00000-2-759b4512-1ef6-4a0a-be07-235ca0329324-00001.parquet
Here is my spark.conf:
spark.jars.packages=org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.0,org.apache.iceberg:iceberg-gcp:1.3.0 spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.defaultCatalog=rest_prod spark.sql.catalog.rest_prod=org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.rest_prod.type=rest spark.sql.catalog.rest_prod.uri=http://localhost:8181
It creates metadata in GCS but seems data folders are missing.
create log of rest API:
iceberg-rest | 2023-07-26T23:59:07.700 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request iceberg-rest | org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: prod.db.sample) iceberg-rest | org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: prod.db.sample iceberg-rest | at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:53) iceberg-rest | at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | 2023-07-26T23:59:07.715 ERROR [org.apache.iceberg.rest.RESTCatalogServlet] - Error processing REST request iceberg-rest | org.apache.iceberg.exceptions.RESTException: Unhandled error: ErrorResponse(code=404, type=NoSuchTableException, message=Table does not exist: prod.db) iceberg-rest | org.apache.iceberg.exceptions.NoSuchTableException: Table does not exist: prod.db iceberg-rest | at org.apache.iceberg.BaseMetastoreCatalog.loadTable(BaseMetastoreCatalog.java:53) iceberg-rest | at org.apache.iceberg.rest.CatalogHandlers.loadTable(CatalogHandlers.java:240) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.handleRequest(RESTCatalogAdapter.java:336) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:384) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogAdapter.execute(RESTCatalogAdapter.java:401) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.execute(RESTCatalogServlet.java:100) iceberg-rest | at org.apache.iceberg.rest.RESTCatalogServlet.doGet(RESTCatalogServlet.java:66) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:687) iceberg-rest | at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) iceberg-rest | at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) iceberg-rest | at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1434) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) iceberg-rest | at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) iceberg-rest | at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1349) iceberg-rest | at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) iceberg-rest | at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) iceberg-rest | at org.eclipse.jetty.server.Server.handle(Server.java:516) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) iceberg-rest | at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) iceberg-rest | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) iceberg-rest | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) iceberg-rest | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) iceberg-rest | at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) iceberg-rest | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) iceberg-rest | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:386) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) iceberg-rest | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) iceberg-rest | at java.base/java.lang.Thread.run(Thread.java:833) iceberg-rest | 2023-07-26T23:59:08.237 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table properties set at catalog level through catalog properties: {} iceberg-rest | 2023-07-26T23:59:08.239 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table properties enforced at catalog level through catalog properties: {} iceberg-rest | 2023-07-26T23:59:08.417 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Successfully committed to table prod.db.sample in 174 ms iceberg-rest | 2023-07-26T23:59:08.418 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Refreshing table metadata from new version: gs://warehouse-iceberg/prod/db/sample/metadata/00000-3e40b56b-aa8c-4b36-a8fa-f0de6368f487.metadata.json
insert log of rest API:
iceberg-rest | 2023-07-26T23:59:56.970 INFO [org.apache.iceberg.BaseMetastoreTableOperations] - Refreshing table metadata from new version: gs://warehouse-iceberg/prod/db/sample/metadata/00000-3e40b56b-aa8c-4b36-a8fa-f0de6368f487.metadata.json iceberg-rest | 2023-07-26T23:59:57.121 INFO [org.apache.iceberg.BaseMetastoreCatalog] - Table loaded by catalog: rest_backend.prod.db.sample
How can I solve this?
@animer3009 the NoSuchTableException, message=Table does not exist: prod.db
error is not necessarily indicating that something went wrong and could be from a Catalog#tableExists() check. You'll see the same stack trace when running through the https://iceberg.apache.org/spark-quickstart/ example when creating the table.
The important part is Successfully committed to table prod.db.sample
, meaning that everything looks as it should during table creation.
However, Failed to get file system for path: gs://warehouse-iceberg/prod/db/sample/data/00000-2-759b4512-1ef6-4a0a-be07-235ca0329324-00001.parquet
indicates that you're most likely missing GCS-related jars on the Spark side that understand the gs
scheme.
Hi guys, Are you going to add GCS support? Any ETA?