DataSQRL / sqrl

Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
https://www.datasqrl.com/
91 stars 12 forks source link

Global aggregation on unbounded file stream does not produce updates #460

Open mbroecheler opened 9 months ago

mbroecheler commented 9 months ago

For example, when running the following script (on the loan data example):

IMPORT source.Applications;
AppCount := SELECT customer_id, SUM(amount) AS total_amount FROM Applications GROUP BY customer_id;

where the Applications source reads from local file continuously (i.e. "monitorIntervalMs" : "0" is NOT set), AppCount will be empty in the database. However, Applications are written to the database.

I assume this is a watermarking issue since the watermark does not advance, the aggregation never fires.

mbroecheler commented 9 months ago

Not sure if this is something we can "fix" as it is Flink behavior, but we should investigate if we can detect this case and warn about it or use a different watermarking strategy for this in our examples so people don't run into it.