API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
I created an example notebook where I use the new Cross-Validator to train a GBT regression model on some sample data. Notebook included as a reference example.
[X] manually tested
[ ] added unit tests
[x] added integration tests
[ ] verified on staging environment (screenshot attached)
Changes
Created a sub-class of the PySpark ML CrossValidator (see https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.ml.tuning.CrossValidator.html#pyspark.ml.tuning.CrossValidator) that replicates the timeseries split method implemented by SKLearn's TimeSeriesSplit (see https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html)
Linked issues
Resolves #409
Functionality
...
...
...
...
...
Tests
I created an example notebook where I use the new Cross-Validator to train a GBT regression model on some sample data. Notebook included as a reference example.