antoinecarme / pyaf

PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.
BSD 3-Clause "New" or "Revised" License
456 stars 73 forks source link

Automate Prototyping Activities - R-based Models #217

Open antoinecarme opened 1 year ago

antoinecarme commented 1 year ago

It is useful to have a git branch which contains all the necessary toolkit for prototyping.

Make it possible to use R/forecast from inside pyaf. "Fake" pyaf models which call R to validate a specific implementation.

This branch is not to be merged.

First application : Threshold AR models #214 and TSMARS models #215

antoinecarme commented 1 year ago

Need to install all needed r-cran-XXXXXXXXX packages in debian.

Most needed : r-cran-forecast and r-cran-caret

antoine@z600:~/dev/python/packages/timeseries/pyaf$ apt-cache show r-cran-forecast
Package: r-cran-forecast
Version: 8.17.0-1
Installed-Size: 1914
Maintainer: Debian R Packages Maintainers <r-pkg-team@alioth-lists.debian.net>
Architecture: amd64
Depends: r-base-core (>= 4.2.1-1), r-api-4.0, r-cran-colorspace, r-cran-fracdiff, r-cran-generics (>= 0.1.2), r-cran-ggplot2 (>= 2.2.1), r-cran-lmtest, r-cran-magrittr, r-cran-nnet, r-cran-rcpp (>= 0.11.0), r-cran-timedate, r-cran-tseries, r-cran-urca, r-cran-zoo, r-cran-rcpparmadillo (>= 0.2.35), libblas3 | libblas.so.3, libc6 (>= 2.29), libgcc-s1 (>= 3.0), libstdc++6 (>= 11)
Recommends: r-cran-testthat, r-cran-uroot
Suggests: r-cran-knitr, r-cran-rmarkdown
Description-en: GNU R forecasting functions for time series and linear models
 Methods and tools for displaying and analysing
 univariate time series forecasts including exponential smoothing
 via state space models and automatic ARIMA modelling.
Description-md5: fbe002920852e5d23ff950431c9f03c4
Homepage: https://cran.r-project.org/package=forecast
Section: gnu-r
Priority: optional
Filename: pool/main/r/r-cran-forecast/r-cran-forecast_8.17.0-1_amd64.deb
Size: 1540732
MD5sum: ad90255623ef7f6c6719b7befca32f49
antoine@z600:~/dev/python/packages/timeseries/pyaf$ apt-cache show r-cran-caret
Package: r-cran-caret
Version: 6.0-93+dfsg-1
Installed-Size: 3668
Maintainer: Debian R Packages Maintainers <r-pkg-team@alioth-lists.debian.net>
Architecture: amd64
Depends: r-base-core (>= 4.2.1-2), r-api-4.0, r-cran-ggplot2, r-cran-lattice (>= 0.20), r-cran-e1071, r-cran-foreach, r-cran-modelmetrics (>= 1.2.2.2), r-cran-nlme, r-cran-plyr, r-cran-proc, r-cran-recipes (>= 0.1.10), r-cran-reshape2, r-cran-withr (>= 2.0.0), libc6 (>= 2.4)
Recommends: r-cran-testthat (>= 0.9.1), r-cran-earth (>= 2.2-3), r-cran-mda, r-cran-mlmetrics, r-cran-fastica, r-cran-kernlab, r-cran-themis (>= 0.1.3)
Suggests: r-cran-bradleyterry2, r-cran-covr, r-cran-dplyr, r-cran-ellipse, r-cran-gam (>= 1.15), r-cran-ipred, r-cran-knitr, r-cran-mass, r-cran-matrix, r-cran-mgcv, r-cran-mlbench, r-cran-nnet, r-cran-party (>= 0.9-99992), r-cran-pls, r-cran-proxy, r-cran-randomforest, r-cran-rann, r-cran-rmarkdown, r-cran-rpart
Description-en: GNU R package for classification and regression training
 This GNU R package provides misc functions for training and plotting
 classification and regression models.
Description-md5: 568fff6316b184e50b859b0f39211d0d
Homepage: https://cran.r-project.org/package=caret
Section: gnu-r
Priority: optional
Filename: pool/main/r/r-cran-caret/r-cran-caret_6.0-93+dfsg-1_amd64.deb
Size: 3446832
MD5sum: d81b051a65be49cff8f69a1828f3bc3d
SHA256: 8225d86fd41959ba6c4314b0b3df39ff2f93fb5cd0218500bf4dc4f4d684151a
antoinecarme commented 1 year ago

Need to have a set of pyaf models that build custom R scripts to internally build the corresponding R forecasting models.

This is a prototyping environment, can be slow and that's OK.

All the logs coming from R should be properly saved under /tmp/pyaf_prototyping/model_name_session/(train|predict).(err | log)

Training script saved in python (and used in R) under /tmp/pyaf_prototyping/model_name/train.R

Training dataset saved in python (and used in R) under /tmp/pyaf_prototyping/model_name/training.csv

R models saved in R (and reloaded before each forecast/predict) under /tmp/pyaf_prototyping/model_name/model.rds

Forecasting/predict script saved in python (and used in R) under /tmp/pyaf_prototyping/model_name/predict.R

Forecast/predict dataset saved in python (and used in R) under /tmp/pyaf_prototyping/model_name/mode_name_input.csv

mode_name should contain the type of model (TAR, TSMARS, ...) and a unique string (date , process_id , ) etc.

output datasets saved by R (and used in python) under /tmp/pyaf_prototyping/model_name/mode_name_output.csv

antoinecarme commented 1 year ago

Sample R training script for Threshold AR models (auto-generated by pyaf for each internal model)

write('', "/tmp/pyaf_prototyping/threshold_ar_20220905164142.004041_139800315743536/train.lock")

options(warn=1);
sink(file("/tmp/pyaf_prototyping/threshold_ar_20220905164142.004041_139800315743536/train.log" , open="wt"), type="output");
sink(file("/tmp/pyaf_prototyping/threshold_ar_20220905164142.004041_139800315743536/train.err" , open="wt"), type="message");
set.seed(1960)
paste("R_VERSION" , R.version.string)
df = read.csv("/tmp/pyaf_prototyping/threshold_ar_20220905164142.004041_139800315743536/training.csv", header=TRUE)
library(NTS, quietly = TRUE);
cat("R_PACKAGE_VERSION",  "NTS", toString(packageVersion("NTS")) , "\n");
thresholds.est = uTAR(y=df$TGT, p1=2, p2=2, d=2, thrQ=c(0,1), Trim=c(0.1,0.9), include.mean=TRUE, method="NeSS", k0=50);
model = uTAR.est(y=df$TGT, , arorder=c(2,2), thr=thresholds.est$thr, d=2);
saveRDS(model, "/tmp/pyaf_prototyping/threshold_ar_20220905164142.004041_139800315743536/model.rds")

file.remove("/tmp/pyaf_prototyping/threshold_ar_20220905164142.004041_139800315743536/train.lock")

sink(type="output");
sink(type="message");
print('end')
antoinecarme commented 1 year ago

Sample forecast/predict script for Threshold AR models (auto-generated by pyaf for each model forecast)

write('', "/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/predict_20220905164841.627680_140163095026208.lock")

options(warn=1);
sink(file("/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/predict_20220905164841.627680_140163095026208.log" , open="wt"), type="output");
sink(file("/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/predict_20220905164841.627680_140163095026208.err" , open="wt"), type="message");
paste("R_VERSION" , R.version.string)
df = read.csv("/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/predict_20220905164841.627680_140163095026208_input.csv", header=TRUE)
reloaded_model = readRDS("/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/model.rds")
library(NTS, quietly = TRUE);
cat("R_PACKAGE_VERSION",  "NTS", toString(packageVersion("NTS")) , "\n");
predicted = uTAR.pred(mode=reloaded_model, orig=0 , h=204 - sum(reloaded_model$nobs),iterations=100,ci=0.95,output=TRUE)
nempty = length(reloaded_model$data) -  length(reloaded_model$residuals)
residuals = rbind(matrix(0, nempty) , matrix(reloaded_model$residuals))
data = reloaded_model$data
fitted = data + residuals
predicted = rbind(fitted, predicted$pred)
write.csv(predicted, file = "/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/predict_20220905164841.627680_140163095026208_output.csv")

file.remove("/tmp/pyaf_prototyping/threshold_ar_20220905164840.860942_140163095026208/predict_20220905164841.627680_140163095026208.lock")

sink(type="output");
sink(type="message");
print('end')
antoinecarme commented 1 year ago

Sample MARS model using R Caret prototyping.

script : https://github.com/antoinecarme/pyaf/blob/R_modeling/tests/caret_r_prototypes/test_ozone_exogenous_MARS_caret.py

image

antoinecarme commented 1 year ago

Sample TAR Model using R NTS package

script : https://github.com/antoinecarme/pyaf/blob/R_modeling/tests/caret_r_prototypes/test_ozone_exogenous_TAR_caret.py

image

antoinecarme commented 1 year ago

R_modeling branch :

https://github.com/antoinecarme/pyaf/tree/R_modeling/

Specific prototyping tests :

https://github.com/antoinecarme/pyaf/tree/R_modeling/tests/caret_r_prototypes