databrickslabs / tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
https://pypi.org/project/dbl-tempo
Other
303 stars 50 forks source link

asofJoin() requires name of timpestamp column names to match #390

Open ghormann opened 4 months ago

ghormann commented 4 months ago

When trying to use asofJoin() on two TSDF records with different time series column names an exception is raised. I'm not sure if this is a considered a bug or a feature, but given that TSDF knows the column name of each TSDF object, it seems like it should be smart enough to execute the join.

Example

image

Workaround

If I make sure the time column is the same for both TSDF records it appears to work as expected. _(Here column 'ts' is created as a copy of EndTime before creating tsdfvalidate).

image

tnixon commented 4 months ago

Thanks for raising this to us, @ghormann - definitely a frustrating issue. I believe this should be fixed in the upcoming v0.2 version, but we'll be sure to double check. Maybe it makes sense to get a quick fix in here before that comes out.