confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
120 stars 1.04k forks source link

Support calculated predicates in pull queries #4484

Closed vinothchandar closed 3 years ago

vinothchandar commented 4 years ago

While fixed predicates work

ksql> SELECT TIMESTAMPTOSTRING(WINDOWSTART, 'yyyy-MM-dd HH:mm:ss') AS WINDOW_START_TS, FULL_NAME,                                            >       RATINGS_COUNT
>  FROM RATINGS_PER_CUSTOMER_PER_MINUTE
>  WHERE ROWKEY='Rica Blaisdell'
>    AND WINDOWSTART > '2020-02-07T16:57:00.000';
+---------------------------------------------+---------------------------------------------+---------------------------------------------+
|WINDOW_START_TS                              |FULL_NAME                                    |RATINGS_COUNT                                |
+---------------------------------------------+---------------------------------------------+---------------------------------------------+
|2020-02-07 16:58:00                          |Rica Blaisdell                               |10                                           |
|2020-02-07 16:59:00                          |Rica Blaisdell                               |7                                            |
|2020-02-07 17:00:00                          |Rica Blaisdell                               |7                                            |
|2020-02-07 17:01:00                          |Rica Blaisdell                               |10                                           |
|2020-02-07 17:02:00                          |Rica Blaisdell                               |5                                            |
|2020-02-07 17:03:00                          |Rica Blaisdell                               |8                                            |
|2020-02-07 17:04:00                          |Rica Blaisdell                               |4                                            |
Query terminated
ksql> SELECT TIMESTAMPTOSTRING(WINDOWSTART, 'yyyy-MM-dd HH:mm:ss') AS WINDOW_START_TS, FULL_NAME, 
>                                                 RATINGS_COUNT
>  FROM RATINGS_PER_CUSTOMER_PER_MINUTE
>  WHERE ROWKEY='Rica Blaisdell'
>    AND WINDOWSTART > 1581008904345;
+---------------------------------------------+---------------------------------------------+---------------------------------------------+
|WINDOW_START_TS                              |FULL_NAME                                    |RATINGS_COUNT                                |
+---------------------------------------------+---------------------------------------------+---------------------------------------------+
|2020-02-07 16:35:00                          |Rica Blaisdell                               |4                                            |
|2020-02-07 16:36:00                          |Rica Blaisdell                               |7                                            |
|2020-02-07 16:37:00                          |Rica Blaisdell                               |8                                            |
|2020-02-07 16:38:00                          |Rica Blaisdell                               |6                                            |
|2020-02-07 16:39:00                          |Rica Blaisdell                               |9                                            |

Calculated predicate does n't

ksql> SELECT TIMESTAMPTOSTRING(WINDOWSTART, 'yyyy-MM-dd HH:mm:ss') AS WINDOW_START_TS, FULL_NAME,
>       RATINGS_COUNT
>  FROM RATINGS_PER_CUSTOMER_PER_MINUTE
>  WHERE ROWKEY='Rica Blaisdell'
>    AND WINDOWSTART > (UNIX_TIMESTAMP() - 86400000);
WINDOWSTART bounds must be BIGINT.
ksql> SELECT TIMESTAMPTOSTRING(WINDOWSTART, 'yyyy-MM-dd HH:mm:ss') AS WINDOW_START_TS, FULL_NAME,
>       RATINGS_COUNT
>  FROM RATINGS_PER_CUSTOMER_PER_MINUTE
>  WHERE ROWKEY='Rica Blaisdell'
>    AND WINDOWSTART > CAST((UNIX_TIMESTAMP() - 86400000) AS BIGINT);
WINDOWSTART bounds must be BIGINT.

Thanks @rmoff for reporting this!

rmoff commented 4 years ago

Confirmed as still not working in 0.11:

ksql> SELECT RECORD_CT
>  FROM MESSAGE_COUNT_BY_4HR
>  WHERE WINDOWSTART > (UNIX_TIMESTAMP()-(1000 * 60 * 60 * 4))
>    AND DUMMY_FIELD=1 ;
>
Window bounds must be an INT, BIGINT or STRING containing a datetime..  See https://cnfl.io/queries for more info.
Add EMIT CHANGES if you intended to issue a push query.
Pull queries require a WHERE clause that:
 - limits the query to a single key, e.g. `SELECT * FROM X WHERE <key-column>=Y;`.
 - (optionally) limits the time bounds of the windowed table.
         Bounds on `WINDOWSTART` and `WINDOWEND` are supported
         Supported operators are EQUAL, GREATER_THAN, GREATER_THAN_OR_EQUAL, LESS_THAN and LESS_THAN_OR_EQUAL

It would be really useful to support this - usecase being "give me the values from the last four hours"

rmoff commented 4 years ago

If we're not going to support this imminently then may I suggest an update to the error message, something like:

Only literal values are currently supported for pull query predicates.
agavra commented 3 years ago

This might have actually been fixed recently with the work we've done to support codegen in where clauses. cc @AlanConfluent can you confirm?

AlanConfluent commented 3 years ago

In theory we can support it now that we have codegen for the where clause. Unfortunately, we still make some assumptions about the structure of the clause, namely that if there's a comparison, one side of it is a column reference...

With table scans (or even an existing key reference), this assumption can be lifted. This requires a followup.

rmoff commented 3 years ago

Would be great if we can implement this suggestion in the meantime, it'll have users a lot of time :)

If we're not going to support this imminently then may I suggest an update to the error message, something like:

Only literal values are currently supported for pull query predicates.