databrickslabs / tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
https://pypi.org/project/dbl-tempo
Other
303 stars 50 forks source link

Tempo causes collect() deprecation warning when used in DLT pipeline #408

Open BradLotsberg opened 1 month ago

BradLotsberg commented 1 month ago

There are a number of collect()[0][0] instances in the Tempo code which trigger deprecation warnings if used in DLT pipelines. Perhaps replace collect()[0][0] with head()[0]. Accomplishes the same thing while avoiding the deprecation warning and might even bring a marginal performance boost since only the top row instead of the whole data frame would move to the driver node.

tnixon commented 1 month ago

This is a good suggestion, thanks @BradLotsberg. @yuriymargulis-db - perhaps you can pick this one up?