apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.52k stars 1.29k forks source link

[multistage] Pinot connection doesn't handle join routing parsing #10688

Closed walterddr closed 1 year ago

walterddr commented 1 year ago

see:

09:57:38.796 ERROR [Connection] [main] Cannot parse table name from query: select D_YEAR, S_NATION, P_CATEGORY, sum(LO_REVENUE - LO_SUPPLYCOST) as profit
from lineorder, dates, customer, supplier, part where LO_CUSTKEY = C_CUSTKEY
and LO_SUPPKEY = S_SUPPKEY and LO_PARTKEY = P_PARTKEY and LO_ORDERDATE = D_DATEKEY
and C_REGION = 'AMERICA' and S_REGION = 'AMERICA' and (D_YEAR = 1997 or D_YEAR = 1998)
and (P_MFGR = 'MFGR#1' or P_MFGR = 'MFGR#2') group by D_YEAR, S_NATION, P_CATEGORY
order by D_YEAR, S_NATION, P_CATEGORY;

java.lang.IllegalStateException: Unsupported join type: COMMA
    at org.apache.pinot.sql.parsers.CalciteSqlParser.compileToJoin(CalciteSqlParser.java:552) ~[classes/:?]
    at org.apache.pinot.sql.parsers.CalciteSqlParser.compileToDataSource(CalciteSqlParser.java:528) ~[classes/:?]
    at org.apache.pinot.sql.parsers.CalciteSqlParser.compileSqlNodeToPinotQuery(CalciteSqlParser.java:475) ~[classes/:?]
    at org.apache.pinot.sql.parsers.CalciteSqlParser.compileToPinotQuery(CalciteSqlParser.java:244) ~[classes/:?]
    at org.apache.pinot.sql.parsers.CalciteSqlParser.compileToPinotQuery(CalciteSqlParser.java:239) ~[classes/:?]
    at org.apache.pinot.sql.parsers.CalciteSqlCompiler.compileToBrokerRequest(CalciteSqlCompiler.java:32) ~[classes/:?]
    at org.apache.pinot.client.Connection.resolveTableName(Connection.java:189) [classes/:?]
    at org.apache.pinot.client.Connection.execute(Connection.java:119) [classes/:?]
    at org.apache.pinot.client.Connection.execute(Connection.java:94) [classes/:?]
    at org.apache.pinot.integration.tests.SSBQueryIntegrationTest.testQueriesValidateAgainstH2(SSBQueryIntegrationTest.java:112) [test-classes/:?]
    at org.apache.pinot.integration.tests.SSBQueryIntegrationTest.testSSBQueries(SSBQueryIntegrationTest.java:106) [test-classes/:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
    at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:108) [testng-6.11.jar:?]
    at org.testng.internal.Invoker.invokeMethod(Invoker.java:661) [testng-6.11.jar:?]
    at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:869) [testng-6.11.jar:?]
    at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1193) [testng-6.11.jar:?]
    at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:126) [testng-6.11.jar:?]
    at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:109) [testng-6.11.jar:?]
    at org.testng.TestRunner.privateRun(TestRunner.java:744) [testng-6.11.jar:?]
    at org.testng.TestRunner.run(TestRunner.java:602) [testng-6.11.jar:?]

this is b/c the Connection code path doesn't go through the v2 engine (which only calls CalciteSqlParser for leaf stages, not CalciteSqlCompiler for the overall query

walterddr commented 1 year ago

solution:

instead of making Connection.resolveTableName call into

  private static String resolveTableName(String query) {
    try {
      return CalciteSqlCompiler.compileToBrokerRequest(query).querySource.tableName;
      ...

we should switch to

    SqlNode sqlNode = CalciteSqlParser.extractSqlNodeAndOptions(query);
    return CalciteSqlParser.extractTableNamesFromNode(sqlNode);
abhioncbr commented 1 year ago

Please assign it to me. Thanks.

abhioncbr commented 1 year ago

Question: extractTableNamesFromNode returns the List of the table names; what should we do for _brokerSelector.selectBroker? Since the current implementation returns the broker address on which the table is hosted, multiple tables can be hosted on different brokers in the case of a JOIN query. Right?

For testing purposes, I passed the first table name to get the broker address, but later the query execution failed with the following error, got it working by just getting the broker for the first table name in the list.

walterddr commented 1 year ago

This is a good question for how the routing is handling I would suggest check out here the controller logic. In https://github.com/apache/pinot/pull/10336 .

In fact the logic is so generic I think we should factor it to a utility. CC @ankitsultana @tibrewalpratik17

abhioncbr commented 1 year ago

Yes, I was thinking the same about moving to some utility class.

Right now, the code is sitting in PinotQueryResource getCommonBrokerTenant, which requires tableConfigs and indirectly HelixResourceManger.

I can give it a shot to move the functionality to some util class.

ankitsultana commented 1 year ago

@walterddr : Any approach works for me. Long term though I wanted to move the parsing code out to a separate dedicated module, similar to how Presto has presto-parser.

cc: @Jackie-Jiang for his thoughts as well.

Jackie-Jiang commented 1 year ago

I think we need to make sure the broker picked is hosting all the tables within the JOIN, essentially do an intersection of multiple broker lists

abhioncbr commented 1 year ago

We can close this issue. Thanks