apache / gravitino

World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
https://gravitino.apache.org
Apache License 2.0
1.09k stars 343 forks source link

[Bug report] Fail to load Postgres table on AWS #1169

Closed justinmclean closed 9 months ago

justinmclean commented 11 months ago

Describe what's wrong

Trino query fails on some tables in AWS Postgres (version 14). Using the HR/Sales playground SQL the sales tables product and sales fail to load with an error when they are queried, all other tables work. One common thing these two tables have is a numeric field for prices, but I'm not sure that is the issue.

Error message and/or stacktrace

SQL Error [65536]: Query failed (#20231215_005905_00267_qr8xb): Failed to operate object [products] operation [LOAD] under [sales], reason [RuntimeException] java.lang.RuntimeException: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at com.datastrato.gravitino.catalog.CatalogOperationDispatcher.doWithCatalog(CatalogOperationDispatcher.java:686) at com.datastrato.gravitino.catalog.CatalogOperationDispatcher.loadTable(CatalogOperationDispatcher.java:332) at com.datastrato.gravitino.server.web.rest.TableOperations.loadTable(TableOperations.java:116) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:134) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:177) at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:176) at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:81) at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400) at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:256) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) at org.glassfish.jersey.internal.Errors.process(Errors.java:292) at org.glassfish.jersey.internal.Errors.process(Errors.java:274) at org.glassfish.jersey.internal.Errors.process(Errors.java:244) at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:235) at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:684) at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:358) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:311) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) at com.datastrato.gravitino.server.auth.AuthenticationFilter.doFilter(AuthenticationFilter.java:57) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) at com.datastrato.gravitino.server.web.VersioningFilter.doFilter(VersioningFilter.java:94) at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.eclipse.jetty.server.Server.handle(Server.java:516) at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:659) at java.util.ArrayList.get(ArrayList.java:435) at com.datastrato.gravitino.catalog.postgresql.converter.PostgreSqlTypeConverter.toGravitinoType(PostgreSqlTypeConverter.java:42) at com.datastrato.gravitino.catalog.postgresql.converter.PostgreSqlTypeConverter.toGravitinoType(PostgreSqlTypeConverter.java:13) at com.datastrato.gravitino.catalog.postgresql.operation.PostgreSqlTableOperations.selectColumnInfoAndExecute(PostgreSqlTableOperations.java:123) at com.datastrato.gravitino.catalog.postgresql.operation.PostgreSqlTableOperations.load(PostgreSqlTableOperations.java:87) at com.datastrato.gravitino.catalog.jdbc.JdbcCatalogOperations.loadTable(JdbcCatalogOperations.java:269) at com.datastrato.gravitino.catalog.CatalogOperationDispatcher.lambda$null$21(CatalogOperationDispatcher.java:334) at com.datastrato.gravitino.catalog.CatalogManager$CatalogWrapper.lambda$doWithTableOps$1(CatalogManager.java:96) at com.datastrato.gravitino.utils.IsolatedClassLoader.withClassLoader(IsolatedClassLoader.java:69) at com.datastrato.gravitino.catalog.CatalogManager$CatalogWrapper.doWithTableOps(CatalogManager.java:91) at com.datastrato.gravitino.catalog.CatalogOperationDispatcher.lambda$loadTable$22(CatalogOperationDispatcher.java:334) at com.datastrato.gravitino.catalog.CatalogOperationDispatcher.doWithCatalog(CatalogOperationDispatcher.java:681) ... 67 more

How to reproduce

  1. Install Postgres on AWS (or use AMI).
  2. Load HR and Sales schemas.
  3. Create catalog for AWS.
  4. Query tables in sales schema via Trino connection in dbeaver.

Additional context

Here the relevant SQL:

CREATE TABLE sales.sales (
    sale_id int4 NULL,
    employee_id int4 NULL,
    store_id int4 NULL,
    product_id int4 NULL,
    customer_id int4 NULL,
    sold date NULL,
    quantity int4 NULL,
    total_amount numeric NULL
);

CREATE TABLE sales.products (
    product_id int4 NULL,
    category_id int4 NULL,
    product_name varchar(100) NULL,
    price numeric NULL
);
justinmclean commented 11 months ago

Note a desc command e.g "desc "metalake_demo.aws2".sales.products" will also fail with the same error.

diqiu50 commented 11 months ago

@justinmclean
Where are these SQL queries executed, in PostgreSQL or Trino?

CREATE TABLE sales.sales (
    sale_id int4 NULL,
    employee_id int4 NULL,
    store_id int4 NULL,
    product_id int4 NULL,
    customer_id int4 NULL,
    sold date NULL,
    quantity int4 NULL,
    total_amount numeric NULL
);

CREATE TABLE sales.products (
    product_id int4 NULL,
    category_id int4 NULL,
    product_name varchar(100) NULL,
    price numeric NULL
);
diqiu50 commented 11 months ago

@Clearvive I think Justin executed these SQL queries on PostgreSQL. Please take a look to see if Gravitino can load these tables.

justinmclean commented 11 months ago

The tables were created in Postgres

justinmclean commented 11 months ago

I also tried on Postgres 13 on AWS and got the same error.

justinmclean commented 11 months ago

Looks like this is an issue with "numeric" changing it to "decimal(10,2)" gets rid of the error.

justinmclean commented 9 months ago

This issue is old and no longer relevant.