apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.83k stars 3.3k forks source link

[feature][routine load] support window feature in routine load #4846

Open caoyang10 opened 4 years ago

caoyang10 commented 4 years ago

When we have both routine load and broker load in one table, the broker load data will cover the routine load data because routine load data is not credible enough. At the same time, we don't hope to change the broker load data but some DIRTY data(earlier data) in routine load will do it. So it's necessary to set the receive window on routine load so that broker load data will not change by routine load data.

Compare to Partition stmt, routine load window feature uses dynamic partitions but Partition stmt is not. For example, I only would like to receive near 6 hours data. I could set properties like PROPERTIES ( "window_interval_sec" = "21600" ) The receive window is [now - 6h, now], what we have to do is check the range of each partition if it has intersaction with receive window. For example, now = "2020-11-05 07:01:23", receive window is ["2020-11-05 01:01:23", "2020-11-05 07:01:23"], partition unit is DAY, partition ["2020-11-05", "2020-11-06"] is hit while other partitions are not. Another example, now = "2020-01-05 05:01:23", receive window is ["2020-11-04 23:01:23", "2020-11-05 05:01:23"], 2 partitions match the rule: ["2020-11-04", "2020-11-05"], ["2020-11-05", "2020-11-06"].

I have already finished the feature and please review the code.

caoyang10 commented 4 years ago

I find a bug that the end of the window should greater than now. In such case, now = "2020-11-05 23:59:50", partition [2020-11-06, 2020-11-07] doesn't belong to [now - 6h, now]. But in routine load, I set batch interval 30s and there are some data those time is 2020-11-06 00:00:0x cannot find the partition. So the window should be correct as [now - 6h, now + sometime]. I just set "sometime" as const number: 2min