Tempo causes collect() deprecation warning when used in DLT pipeline

databrickslabs / tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

https://pypi.org/project/dbl-tempo

Other

309 stars 53 forks source link

Tempo causes collect() deprecation warning when used in DLT pipeline #408

Open BradLotsberg opened 5 months ago

BradLotsberg commented 5 months ago

There are a number of collect()[0][0] instances in the Tempo code which trigger deprecation warnings if used in DLT pipelines. Perhaps replace collect()[0][0] with head()[0]. Accomplishes the same thing while avoiding the deprecation warning and might even bring a marginal performance boost since only the top row instead of the whole data frame would move to the driver node.

tnixon commented 5 months ago

This is a good suggestion, thanks @BradLotsberg. @yuriymargulis-db - perhaps you can pick this one up?