We resample a row of location data forward in time into the next minute bin until 1 ms before the next sensed location timestamp or the timestamp corresponding to last sensed timestamp plus the consecutive threshold buffer is reached, whichever comes first. This can result in time differences <1 minute (60000 ms), and as small as a few ms, between a final resampled row and the subsequent sensed location row:
resample_group
limit
timestamp
provider
id
diff_bw_curr_and_next_row_ms
830
1660842237896
1660842117894
fused
0
60000
830
1660842237896
1660842177894
resampled
1
60000
830
1660842237896
1660842237894
resampled
2
3
831
1660842357892
1660842237897
fused
0
60000
831
1660842357892
1660842297897
resampled
1
59996
832
1660842477893
1660842357893
fused
0
60000
832
1660842477893
1660842417893
resampled
1
60001
The inclusion of such rows in the processed locations data can result in unexpected negative values for features like varspeed (which should always be non-negative) in processed data from the PHONE_LOCATIONS DORYAB provider downstream.
We therefore add a condition to drop rows from the processed locations data when the provider is resampled and the difference between that resampled row's timestamp and the next (leading) timestamp is <60000 ms:
resample_group
limit
timestamp
provider
id
diff_bw_curr_and_next_row_ms
830
1660842237896
1660842117894
fused
0
60000
830
1660842237896
1660842177894
resampled
1
60003
831
1660842357892
1660842237897
fused
0
119996
832
1660842477893
1660842357893
fused
0
60000
832
1660842477893
1660842417893
resampled
1
60001
Note that this change still allows for the time difference between two sensed location timestamps to be <60000 ms. It only ensures that the time difference between a resampled timestamp and subsequent sensed location timestamp will be $\ge$ 60000 ms. The time difference between a sensed location timestamp and subsequent resampled timestamp or between two consecutive resampled timestamps is always exactly 60000 ms.
We resample a row of location data forward in time into the next minute bin until 1 ms before the next sensed location timestamp or the timestamp corresponding to last sensed timestamp plus the consecutive threshold buffer is reached, whichever comes first. This can result in time differences <1 minute (60000 ms), and as small as a few ms, between a final resampled row and the subsequent sensed location row:
The inclusion of such rows in the processed locations data can result in unexpected negative values for features like
varspeed
(which should always be non-negative) in processed data from the PHONE_LOCATIONS DORYAB provider downstream.We therefore add a condition to drop rows from the processed locations data when the provider is
resampled
and the difference between that resampled row's timestamp and the next (leading) timestamp is <60000 ms:Note that this change still allows for the time difference between two sensed location timestamps to be <60000 ms. It only ensures that the time difference between a resampled timestamp and subsequent sensed location timestamp will be $\ge$ 60000 ms. The time difference between a sensed location timestamp and subsequent resampled timestamp or between two consecutive resampled timestamps is always exactly 60000 ms.