nearest_of_join: Nearest timestamp join between two time-series tables

Motivation

It is often necessary to merge two time-series tables based on the closest timestamp rather than using ASOF. This scenario arises, for instance, when the data is coming from two sensors operating simultaneously and transmitting data at the same time interval. We are looking to use Tempo to join the data such that the points are matched based on their smallest time delta because in this case, there is no guarantee that the timestamps of table A will always precede the timestamps of table B (or vice versa).

Example

Table A	event_ts	a_data
10	x
21	y
29	z

Table B	event_ts	b_data
10	i
20	ii
31	iii

table_a.nearest_of_join(table_b)	event_ts	a_data
10	x	i
20	y	ii
31	z	iii

Edge cases and considerations

If needed for efficient implementation, it makes sense to make the tsPartitionVal and/or tolerance required parameters.

databrickslabs / tempo