DataSQRL / sqrl

Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
https://www.datasqrl.com/
97 stars 14 forks source link

Global aggregation on unbounded file stream does not produce updates #460

Open mbroecheler opened 11 months ago

mbroecheler commented 11 months ago

For example, when running the following script (on the loan data example):

IMPORT source.Applications;
AppCount := SELECT customer_id, SUM(amount) AS total_amount FROM Applications GROUP BY customer_id;

where the Applications source reads from local file continuously (i.e. "monitorIntervalMs" : "0" is NOT set), AppCount will be empty in the database. However, Applications are written to the database.

I assume this is a watermarking issue since the watermark does not advance, the aggregation never fires.

mbroecheler commented 11 months ago

Not sure if this is something we can "fix" as it is Flink behavior, but we should investigate if we can detect this case and warn about it or use a different watermarking strategy for this in our examples so people don't run into it.

henneberger commented 1 month ago

Let's add an warning when mixing bounded with unbounded streams if the file monitor config value isn't set.