API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
Issue: When a Timestamp column is specified as the ts_col for a tsdf, it does not get accurately interpreted as a Timestamp field in the logic that handles rangeBackWindowSecs.
Root Cause: In tsdf.py, def withRangeStats(), the following code never evaluates to True because the str representation of the dataType is TimestampType(), while the code expects TimestampType. The line causing the error is 1105:
if str(self.df.schema[self.ts_col].dataType) == "TimestampType":
Error:
Cannot resolve "(PARTITION BY <partition_col> ORDER BY DATE_TIME ASC NULLS FIRST RANGE BETWEEN -300 FOLLOWING AND CURRENT ROW)" due to data type mismatch: The data type "TIMESTAMP" used in the order specification does not match the data type "BIGINT" which is used in the range frame. SQLSTATE: 42K09;
Issue: When a Timestamp column is specified as the
ts_col
for atsdf
, it does not get accurately interpreted as a Timestamp field in the logic that handlesrangeBackWindowSecs
.Root Cause: In tsdf.py,
def withRangeStats()
, the following code never evaluates to True because thestr
representation of the dataType isTimestampType()
, while the code expectsTimestampType
. The line causing the error is 1105:if str(self.df.schema[self.ts_col].dataType) == "TimestampType":
Setup:
tsdf = tempo.TSDF(df, ts_col='<timestamp_column>')
tsdf_2 = tsdf.withRangeStats("SIDE", rangeBackWindowSecs=300).df
Error:
Cannot resolve "(PARTITION BY <partition_col> ORDER BY DATE_TIME ASC NULLS FIRST RANGE BETWEEN -300 FOLLOWING AND CURRENT ROW)" due to data type mismatch: The data type "TIMESTAMP" used in the order specification does not match the data type "BIGINT" which is used in the range frame. SQLSTATE: 42K09;