databrickslabs / tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
https://pypi.org/project/dbl-tempo
Other
306 stars 52 forks source link

resample/interpolate doesn't handle columns with dot in name #361

Open ghormann opened 1 year ago

ghormann commented 1 year ago

ISSUE

While maybe not the best practice, databricks does allow columns to have "." in the name. When doing a resample with interporlate, this results in a cannot resolve column name error

How to reproduce

  1. Create a TSDF with columns that include a "." image

  2. Attempt to resample and interporlate with

    resample_tsdf = base_tsdf.resample(freq="30 seconds", func="mean").interpolate(method="ffill")

An error is produced

AnalysisException: Cannot resolve column name "Bundler.Status.CurMachSpeed" among (site, line, ts, Bundler.Status.CurMachSpeed, Bundler.Status.MachSpeed, agg_key); did you mean to quote the Bundler.Status.CurMachSpeed column?

Workaround

Rename the columns before resampling / interpolate

tnixon commented 1 year ago

Thanks for bringing this to our attention @ghormann - we'll look into it and see if we can get a fix out soon