apache / pinot

Apache Pinot - A realtime distributed OLAP datastore
https://pinot.apache.org/
Apache License 2.0
5.27k stars 1.23k forks source link

Fixing query options #8906

Closed walterddr closed 1 year ago

walterddr commented 2 years ago

This issue is created follow up #8880.

Background

Currently, Pinot SQL OPTION keyword is a REGEX match that is allow ANYWHERE in the SQL string. This causes SQLi issues.

Backward-Compatibility Issue

8880 proposed an alternative syntax of using OPTION similar to SQL setStatement (see: https://calcite.apache.org/docs/reference.html)

However, this creates a backward-incompatible change towards the Pinot SQL syntax.

Thus we propose to hold off the syntax change until we come to a consensus on which syntax to use.

SQLi Resolution

Alternative Syntax Change

Note, for the syntax code segment: <, > is used to quote reserved keywords or reserved operators.

As STATEMENT

Pinot query now only allows a single STATEMENT. So there's no difference setting OPTIONS as "clause" associated with the statement, or associates the OPTION with the entire SQL statement list context.

Potentially acceptable syntax listed below:

  1. Standard SET statement
    
    setStatement:
    <SET> identifier <=> literal <;>

identifier: simpleIdentifier | <">quoted.complex.identifier<">

literal: stringLiteral | integerLiteral | doubleLiteral | booleanLiteral

PRO: standard SET statement is default supported in Calcite, also supported by major SQL engines:
   - https://dev.mysql.com/doc/refman/8.0/en/set-statement.html, 
   - https://www.postgresql.org/docs/current/sql-set.html
CON: standard SET statement only allows a single key-value, for multiple key-value options, multiple set statement is needed

2. Special `OPTION` keyword

optionStatement: