confluentinc / ksql

The database purpose-built for stream processing applications.
https://ksqldb.io
Other
45 stars 1.04k forks source link

ksqlDB should optimize pull queries for streams for time ranges #9181

Open abraham-leal opened 2 years ago

abraham-leal commented 2 years ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently, ksqlDB causes a full topic scan whenever performing a pull query over a stream. This is inefficient when looking up specific sets of data, but necessary due to how pull queries are implemented over streams.

Describe the solution you'd like A clear and concise description of what you want to happen.

Ideally, ksqlDB should be able to perform optimizations on the pull query to make it more performant according to a defined time range of the query. For example:

-- Should only scan from 1654618081 
SELECT * FROM STREAM WHERE ROWTIME > 1654618081;

-- Should only scan between 1654618081 and 1654618080
SELECT * FROM STREAM WHERE ROWTIME < 1654618081 AND ROWTIME > 1654618080 ;

-- Should only scan to 1654618081
SELECT * FROM STREAM WHERE ROWTIME < 1654618081;

This should be possible given Kafka allows to seek to an offset according to their timestamp (this optimization may not be possible with user-defined custom ROWTIMEs).

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

No real alternative here.

Additional context Add any other context or screenshots about the feature request here.

sujayopensource commented 1 year ago

Hi @agavra , Im a newbie interested in KSQL and would like to work on it. Shall I assign it to myself? Thanks.