API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
It is often necessary to merge two time-series tables based on the closest timestamp rather than using ASOF. This scenario arises, for instance, when the data is coming from two sensors operating simultaneously and transmitting data at the same time interval. We are looking to use Tempo to join the data such that the points are matched based on their smallest time delta because in this case, there is no guarantee that the timestamps of table A will always precede the timestamps of table B (or vice versa).
Example
Table A
event_ts
a_data
10
x
21
y
29
z
Table B
event_ts
b_data
10
i
20
ii
31
iii
table_a.nearest_of_join(table_b)
event_ts
a_data
b_data
10
x
i
20
y
ii
31
z
iii
Edge cases and considerations
If needed for efficient implementation, it makes sense to make the tsPartitionVal and/or tolerance required parameters.
Motivation
It is often necessary to merge two time-series tables based on the closest timestamp rather than using ASOF. This scenario arises, for instance, when the data is coming from two sensors operating simultaneously and transmitting data at the same time interval. We are looking to use Tempo to join the data such that the points are matched based on their smallest time delta because in this case, there is no guarantee that the timestamps of table A will always precede the timestamps of table B (or vice versa).
Example
Edge cases and considerations
tsPartitionVal
and/ortolerance
required parameters.