apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.1k stars 345 forks source link

[Bug report] Create fileset schema, it won't use schema location, but use catalog location and miss the file separator #4582

Closed shaofengshi closed 4 weeks ago

shaofengshi commented 3 months ago

Version

0.5.1

Describe what's wrong

Firstly, create a hadoop catalog, for example:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "schema2", "comment": "comment", "properties": { "location": "hdfs://hive:9000" } }' http://localhost:8090/api/metalakes/metalake_demo/catalogs/hadoop2/schemas

Then create a schema with location:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "schema2", "comment": "comment", "properties": { "location": "hdfs://hive:9000/user2" } }' http://localhost:8090/api/metalakes/metalake_demo/catalogs/hadoop2/schemas

It reports an error: {"code":1001,"type":"IllegalArgumentException","message":"Failed to operate schema(s) [schema2] operation [CREATE] under catalog [hadoop2], reason [Relative path in absolute URI: hdfs://hive:9000schema2]","stack":["java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://hive:9000schema2","\tat org.apache.hadoop.fs.Path.initialize(Path.java:259)","\tat org.apache.hadoop.fs.Path.<init>(Path.java:157)","\tat org.apache.hadoop.fs.Path.<init>(Path.java:125)","\tat com.datastrato.gravitino.catalog.hadoop.HadoopCatalogOperations.lambda$getSchemaPath$8(HadoopCatalogOperations.java:619)","\tat java.util.Optional.map(Optional.java:215)","\tat com.datastrato.gravitino.catalog.hadoop.HadoopCatalogOperations.getSchemaPath(HadoopCatalogOperations.java:619)","\tat com.datastrato.gravitino.catalog.hadoop.HadoopCatalogOperations.createSchema(HadoopCatalogOperations.java:384)","\tat com.datastrato.gravitino.catalog.SchemaOperationDispatcher.lambda$createSchema$4(SchemaOperationDispatcher.java:100)","\tat com.datastrato.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithSchemaOps$0(CatalogManager.java:103)","\tat com.datastrato.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:72)","\tat com.datastrato.gravitino.catalog.CatalogManager$CatalogWrapper.doWithSchemaOps(CatalogManager.java:98)","\tat com.datastrato.gravitino.catalog.SchemaOperationDispatcher.lambda$createSchema$5(SchemaOperationDispatcher.java:100)","\tat com.datastrato.gravitino.catalog.OperationDispatcher.doWithCatalog(OperationDispatcher.java:107)","\tat com.datastrato.gravitino.catalog.SchemaOperationDispatcher.createSc

Please note here: " [Relative path in absolute URI: hdfs://hive:9000schema2]", which misses a "/" separator.

Here has two issues: 1) The system should automatically add the separator if missing; 2) As the location was specified the schema creation request, seems it doesn't use this property, but use its catalog's storage location, and then contact the schema name as the location.

Error message and/or stacktrace

{"code":1001,"type":"IllegalArgumentException","message":"Failed to operate schema(s) [schema2] operation [CREATE] under catalog [hadoop2], reason [Relative path in absolute URI: hdfs://hive:9000schema2]","stack":["java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://hive:9000schema2","\tat org.apache.hadoop.fs.Path.initialize(Path.java:259)","\tat org.apache.hadoop.fs.Path.(Path.java:157)","\tat org.apache.hadoop.fs.Path.(Path.java:125)","\tat com.datastrato.gravitino.catalog.hadoop.HadoopCatalogOperations.lambda$getSchemaPath$8(HadoopCatalogOperations.java:619)","\tat java.util.Optional.map(Optional.java:215)","\tat com.datastrato.gravitino.catalog.hadoop.HadoopCatalogOperations.getSchemaPath(HadoopCatalogOperations.java:619)","\tat com.datastrato.gravitino.catalog.hadoop.HadoopCatalogOperations.createSchema(HadoopCatalogOperations.java:384)","\tat com.datastrato.gravitino.catalog.SchemaOperationDispatcher.lambda$createSchema$4(SchemaOperationDispatcher.java:100)","\tat com.datastrato.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithSchemaOps$0(CatalogManager.java:103)","\tat com.datastrato.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:72)","\tat com.datastrato.gravitino.catalog.CatalogManager$CatalogWrapper.doWithSchemaOps(CatalogManager.java:98)","\tat com.datastrato.gravitino.catalog.SchemaOperationDispatcher.lambda$createSchema$5(SchemaOperationDispatcher.java:100)","\tat com.datastrato.gravitino.catalog.OperationDispatcher.doWithCatalog(OperationDispatcher.java:107)","\tat com.datastrato.gravitino.catalog.SchemaOperationDispatcher.createSchema(SchemaOperationDispatcher.java:98)","\tat com.datastrato.gravitino.catalog.SchemaNormalizeDispatcher.createSchema(SchemaNormalizeDispatcher.java:47)","\tat com.datastrato.gravitino.listener.SchemaEventDispatcher.createSchema(SchemaEventDispatcher.java:76)","\tat com.datastrato.gravitino.server.web.rest.SchemaOperations.lambda$createSchema$2(SchemaOperations.java:105)","\tat com.datastrato.gravitino.lock.TreeLockUtils.doWithTreeLock(TreeLockUtils.java:35)","\tat com.datastrato.gravitino.server.web.rest.SchemaOperations.lambda$createSchema$3(SchemaOperations.java:101)","\tat java.security.AccessController.doPrivileged(Native Method)","\tat javax.security.auth.Subject.doAs(Subject.java:422)","\tat com.datastrato.gravitino.utils.PrincipalUtils.doAs(PrincipalUtils.java:25)","\tat com.datastrato.gravitino.server.web.Utils.doAs(Utils.java:121)","\tat com.datastrato.gravitino.server.web.rest.SchemaOperations.createSchema(SchemaOperations.java:95)","\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)","\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)","\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)","\tat java.lang.reflect.Method.invoke(Method.java:498)","\tat org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)","\tat org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146)","\tat org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189)","\tat org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176)","\tat org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93)","\tat org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478)","\tat org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400)","\tat org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)","\tat org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256)","\tat org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)","\tat org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)","\tat org.glassfish.jersey.internal.Errors.process(Errors.java:292)","\tat org.glassfish.jersey.internal.Errors.process(Errors.java:274)","\tat org.glassfish.jersey.internal.Errors.process(Errors.java:244)","\tat org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)","\tat org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235)","\tat org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684)","\tat org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)","\tat org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)","\tat org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358)","\tat org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311)","\tat org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)","\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)","\tat org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)","\tat com.datastrato.gravitino.server.authentication.AuthenticationFilter.doFilter(AuthenticationFilter.java:59)","\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)","\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)","\tat com.datastrato.gravitino.server.web.VersioningFilter.doFilter(VersioningFilter.java:97)","\tat org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)","\tat org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)","\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)","\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)","\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600)","\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)","\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)","\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)","\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)","\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)","\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)","\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)","\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)","\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)","\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)","\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)","\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)","\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)","\tat org.eclipse.jetty.server.Server.handle(Server.java:516)","\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)","\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)","\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)","\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)","\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)","\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)","\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)","\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)","\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)","\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)","\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)","\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)","\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)","\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)","\tat java.lang.Thread.run(Thread.java:750)","Caused by: java.net.URISyntaxException: Relative path in absolute URI: hdfs://hive:9000schema2","\tat java.net.URI.checkPath(URI.java:1823)","\tat java.net.URI.(URI.java:745)","\tat org.apache.hadoop.fs.Path.initialize(Path.java:256)","\t... 89 more"]}%

How to reproduce

Use gravitino playground, v0.5.1:

Firstly, create a hadoop catalog, for example:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "schema2", "comment": "comment", "properties": { "location": "hdfs://hive:9000" } }' http://localhost:8090/api/metalakes/metalake_demo/catalogs/hadoop2/schemas

Then create a schema with location:

curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \ -H "Content-Type: application/json" -d '{ "name": "schema2", "comment": "comment", "properties": { "location": "hdfs://hive:9000/user2" } }' http://localhost:8090/api/metalakes/metalake_demo/catalogs/hadoop2/schemas

Additional context

No response

jerryshao commented 4 weeks ago

The problem here is that "hdfs://hive:9000" is not a full qualified path, so using this to create a sub-path will lead to en error. To handle this scenario, we need to add "/" to the end of the path.