TimelyDataflow / differential-dataflow

An implementation of differential dataflow using timely dataflow on Rust.
MIT License
2.54k stars 183 forks source link

Question: how to change data timestamp for late arriving data #400

Open cetsupport opened 1 year ago

cetsupport commented 1 year ago

Reason to ask:

Due to IoT(Internet of Things) scenario, lot of chance the data will arrive with unknown delay. We need put data to right timestamp which is data collected from device rather than the time data arrives.

Question:

Usually the InputSession.insert will use the same timestamp which been advanced by advacne_to function. As restriction of the timely dataflow, we not allow advance to previous timestamp. I'd like to know how to handle such case in differential dataflow.

Thanks lot.

nooberfsh commented 1 year ago

Hi, check out UnorderedInput

cetsupport commented 1 year ago

Thanks @nooberfsh , I will check it. And I found update_at function for InputSession, should I also can use update_at function?

Update: Looks like shouldn't use update_at as it's require parameter time must greater than current time. assert!(self.time.less_equal(&time));

frankmcsherry commented 1 year ago

You are able to use update_at, but you must maintain the property that you do not advance the input session's time past the times you would like to use. The input session's time is a promise to the rest of the system that your times will be at least whatever that time is. This can be a problem if you can receive arbitrarily delayed inputs, but there isn't too much to do if you want "correct" outputs that are not indefinitely delayed. Another option is to look into multi-temporal timestamps, which allow you to track both event time and system time at the same time.

cetsupport commented 1 year ago

Thanks @frankmcsherry for the comments. I will study the post first, the post you shared is very useful for understanding some concepts which confused me before. I read some of blog but not noticed this one...

Thanks again.