Open walterddr opened 7 months ago
Hi @walterddr,
I'm just following the issue and I think applying the limit by default is a really good idea too. Although I'm quite new to the pinot code base, I think the starting point is the setLimit()
method https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java#L495-L499
I have a question. How do we know if the LIMIT
is set in the main query or the sub query as we only want to set default limit to the main query. Could you point me to the right place in the code base ?
I think I found the place where we parse the subquery but not sure how they all are connected
https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-common/src/main/java/org/apache/pinot/sql/parsers/CalciteSqlParser.java#L510-L519
Would love to create a PR to solve this if you permit :smile:
Hi @dttung2905 thank you for the interest.
yes this is indeed a relatively good task to get to know the pinot codebase (not necessarily beginner though)
a bit of a explanation: the code you linked are for the v1 engine not for the multi-stage engine. (e.g. the function you linked is to convert SqlNode
into PinotQuery
. In order to achieve similar effect in the multi-stage engine:
look for where both engine utilize the same logic to parse the SQL string into SqlNodeAndOptions
https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-broker/src/main/java/org/apache/pinot/broker/requesthandler/MultiStageBrokerRequestHandler.java#L105-L115
look for where the query is being converted into an execution plan (compileQuery
method)
https://github.com/apache/pinot/blob/0a4398634be81cdbbe891b3da249134ef98743e7/pinot-query-planner/src/main/java/org/apache/pinot/query/QueryEnvironment.java#L144-L148
you can see that the compilation runs several steps including validation, relational planning and optimizer. any one of these steps can be used to check the fetch limit
If this is too complicated, please feel free to check in again (i added the beginner task tag but decided to remove it as it is not as straightforward as I original thought of)
Thanks @walterddr for the detailed pointer.
I take a look at the code and I think validate
stage might be a good place to implement that check, something like
private SqlNode validate(SqlNode parsed, PlannerContext plannerContext)
throws Exception {
// 2. validator to validate.
SqlNode validated = plannerContext.getValidator().validate(parsed);
// Check for LIMIT clause in the main query
if (validated instanceof SqlSelect) {
SqlSelect sqlSelect = (SqlSelect) validated;
if (sqlSelect.getFetch() == null) {
// Add LIMIT 100 clause
SqlParserPos originalPos = sqlSelect.getParserPosition();
SqlNode limitNode = SqlLiteral.createExactNumeric("100", originalPos);
((SqlSelect) validated).setFetch(limitNode);
}
}
if (null == validated || !validated.getKind().belongsTo(SqlKind.QUERY)) {
throw new IllegalArgumentException(
String.format("unsupported SQL query, cannot validate out a valid sql from:\n%s", parsed));
}
return validated;
What do you think about this? Not sure if there are other places to change ( probably change test cases too )
this will work only in happy-path. it is not guarantee that the validated top is a always SqlSelect for normal queries. but it is a good starting point. ( for example it could be a SqlSetOperator if the top is a UNION ) i would suggest try out a whole bunch of queries and see if there's any query patterns that could evade these assumption (those could be a good test cases to be added)