apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Query error on `select $segmentName, * from table limit 1` #7867

Closed lgo closed 2 years ago

lgo commented 2 years ago

While running some adhoc queries to debug some things, I found the following query would fail. It's not a significant issue, because a query explicitly listing columns to select works fine.

select $segmentName, * from table limit 1

with the exception(s)

[
  {
    "errorCode": 200,
    "message": "QueryExecutionError:\njava.lang.RuntimeException: Caught exception while running CombinePlanNode.\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:146)\n\tat org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:41)\n\tat org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:302)\n...\nCaused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException\n\tat java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)\n\tat java.base/java.util.concurrent.FutureTask.get(FutureTask.java:205)\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:135)\n\t... 15 more\n...\nCaused by: java.lang.NullPointerException"
  }
]

Note, the commit this was on is roughly ce9fb572d157d82f8a014624152ddd53332372be, and I have not yet tried to reproduce this on the latest version.

Jackie-Jiang commented 2 years ago

Pinot doesn't support select * with extra columns in the select clause currently.

There are 2 ways to fix the issue:

  1. Re-write the query on the broker side with the actual columns
  2. Expand the * on each segment on the server side (current approach in SelectionOperatorUtils.extractExpressions())

IMO the first fix is cheaper as the rewrite only need to be performed once per query instead of once per segment.

Would you like to contribute a fix for this issue?

suddendust commented 2 years ago

@Jackie-Jiang Can I pick this up if Igo is not working on this?

suddendust commented 2 years ago

@Jackie-Jiang Thinking about the behaviour in the following cases:

  1. select playerID, * from baseballStats: We should return the playerID column once or twice? IMO we should return this twice as that is what the user has asked for.
  2. What about default virtual colums? They shouldn't be returned (like how doing select * right now doesn't return the virtual columns).
Jackie-Jiang commented 2 years ago

@suddendust Thanks for volunteering on this. I've assigned the issue to you

For the questions:

  1. I think the standard SQL behavior returns each column only once
  2. We should not return the virtual columns, unless it is explicitly queried, e.g. select $docId, * from ...
suddendust commented 2 years ago

This can be closed as it's released in 0.10.0 @Jackie-Jiang