Closed spsanderson closed 4 years ago
@spsanderson That's quite the dependency list. You should tell Dancho to set up his functions so that users don't have to load all the extra libraries. I remember a CRAN person yelling at me over something similar. Nonetheless, I couldn't get all your functions to run so I'm kind of at a loss. Can you let me know what the final function is doing - tq_transmute() because that was the only one I couldn't figure out how to run. Here's the code I ended up running until the error:
install.load::install_load(
"tidyquant"
,"timetk"
, "tibbletime"
, "tsibble"
, "sweep"
, "anomalize"
, "caret"
, "forecast"
, "funModeling"
# , "xts"
# , "fpp"
, "lubridate"
, "tidyverse"
# , "urca"
# , "prophet"
, "fable"
, "feasts"
, "RemixAutoML"
)
# Data ----
url <- "https://cci30.com/ajax/getIndexHistory.php"
destfile <- "data/cci30_OHLCV.csv"
data <- data.table::fread(url)
class(data)
# Get month end of file - last day of previous month
# Format Date ####
# library(magrittr)
# df$Date <- lubridate::ymd(df$Date)
# df <- df %>% dplyr::mutate(month_start = floor_date(Date, unit = "month") - period(1, units = "day"))
data.table::set(data, j = "Date", value = as.Date(data$Date))
data[, month_start := lubridate::floor_date(x = Date, unit = "month")][, month_start := month_start - lubridate::days(1)]
# df_tbl <- tsibble::as_tsibble(df, index = Date) %>%
# filter(Date <= max(month_start)) %>%
# select(Date, Open, High, Low, Close, Volume)
data <- data[Date <= max(month_start), .SD, .SDcols = names(data)[1:(ncol(data) - 1L)]]
# Coerce df to tibble ####
# df_tbl <- as_tibble(df_tbl)
# featurePlot(
# x = df_tbl[,c("Open","High","Low","Volume")]
# , y = df_tbl$Close
# , plot = "pairs"
# , auto.key = list(columns = 4)
# , na.action(na.omit)
# )
caret::featurePlot(
x = data[, c(1:4)]
, y = data$Close
, plot = "pairs"
, auto.key = list(columns = 4)
, na.action(na.omit)
)
# Time Parameter ----
time_param <- "weekly"
# Make a log returns of close object
df.ts <- df_tbl %>%
tq_transmute(
select = Close
, periodReturn
, period = time_param
, type = "log"
, col_rename = str_c(str_to_title(time_param),"Log_Returns", sep = "_")
)
data <- dplyr::as_tibble(data)
#### I was forced to load these up to attempt to run the below function
library(zoo)
library(xts)
library(quantmod)
library(PerformanceAnalytics)
install.packages("time_param");library(time_param)
##### Warning in install.packages : package ‘time_param’ is not available (for R version 4.0.0)
#### Let me know what this does and maybe I can replicate it real quick
data <- data %>%
tidyquant::tq_transmute(
select = Close
, periodReturn
, period = time_param
, type = "log"
, col_rename = str_c(str_to_title(time_param),"Log_Returns", sep = "_")
)
# I didn't look into this yet...
RemixAutoML::AutoBanditSarima(data = df.ts, TargetVariableName = "Weekly_Log_Returns", DateColumnName = "Date")
@spsanderson Have you tried filling out all the function arguments for AutoBanditSarima()? That would be where I would start.
time_param is a variable that equals “weekly” thought I posted that in
time_param <- “weekly”
Sent from my iPhone
On Jul 7, 2020, at 2:09 AM, Adrian notifications@github.com wrote:
@spsanderson Have you tried filling out all the function arguments for AutoBanditSarima()? That would be where I would start.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
No I did not fill out all the params for the function, I will try that
@spsanderson The issue was getting the function ts_transmute() to run. I think it required me to upgrade to a different R version. Nevertheless, I'm not sure what the output should look like after I run that function. Is there a way to upload that data somewhere for me to download?
the data I uploaded is the final data. weekly_log_returns.xlsx
when I run that data through AutoBanditSarima() the model fails to build, maybe there are not enough data points, when I run the model without any mutation a model builds. the tq_transmute() is getting the log return of the index aggregated by week.
@spsanderson I was able to run your data through and I did find a glitch. The best model that was found came directly from a default auto.arima(). That hasn't really happened yet. But, it did call attention to some downstream code that thought fourier terms were being used when they in fact weren't (only happens when the auto.arima produces a winner which is not typically the case). I made a fix to that. I am still getting the message that no suitable model was found so there's some more digging for me to do. I would like to note that financial asset pricing data is probably not the best dataset to test time series models. I would imagine that you'd want to include other variables into the model and when that is the case my guess is that machine learning models will do a better job. Ideally, for measuring the efficacy of time series models, you want several data sets where some have trend while others do not, but both have patterns in the data that make time series models more suitable. Just my two cents.
@spsanderson FYI - I was able to run your data set without error. Feel free to reinstall and give it another attempt.
Glad you were able to find the error. For this data Arima actually isn’t horrible it’s typically not the best performing but it’s not horrible the data is aggregated at the weekly level and it is the log returns of an index so we’re not really looking for the price but more or less will the future log returns be positive or negative and luckily the density of the log returns is fairly normal
Sent from my iPhone
On Jul 8, 2020, at 6:26 PM, Adrian notifications@github.com wrote:
@spsanderson FYI - I was able to run your data set without error. Feel free to reinstall and give it another attempt.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
From a financial perspective, time series models are essentially technical analysis versus fundamental analysis. To me, the best way to beat the market would be through insider trading (I know, unless you are in congress, you aren't doing it). But it does lead to interesting questions about the type of information that drives the prices changes. Given that, if I really wanted to build a model to do statistical arbitrage, I would want to incorporate other variables into the model. Nonetheless, I have been using data sets from the ppa package and the ppa2 package to analyze model performance and feature upgrades to existing models. I'm pretty sure you know about the book, "Forecasting: Principles and Practice" https://otexts.com/fpp2/, which Rob J Hyndman wrote and he uses data from those packages to go through the book. I also like that Walmart data set to test out forecasting models that can build forecasts by grouping variables so I can see if there is merit to running a single model versus generating a bunch of models for each grouping level.
In terms of the error, the fix was to include a simple tryCatch around an if statement in one of the sub functions that gets called. Apparently, NextGrid doesn't always exist when it's being referenced.
# Define lambda----
if(run != 1L) {
tryCatch({if(NextGrid$BoxCox[1L] == "skip") {
lambda <- NULL
} else {
lambda <- "auto"
}}, error = function(x) lambda <- NULL)
Easy fix and yes I know the book it’s the Bible basically
Sent from my iPhone
On Jul 8, 2020, at 7:01 PM, Adrian notifications@github.com wrote:
From a financial perspective, time series models are essentially technical analysis versus fundamental analysis. To me, the best way to beat the market would be through insider trading (I know, unless you are in congress, you aren't doing it). But it does lead to interesting questions about the type of information that drives the prices changes. Given that, if I really wanted to build a model to do statistical arbitrage, I would want to incorporate other variables into the model. Nonetheless, I have been using data sets from the ppa package and the ppa2 package to analyze model performance and feature upgrades to existing models. I'm pretty sure you know about the book, "Forecasting: Principles and Practice" https://otexts.com/fpp2/, which Rob J Hyndman wrote and he uses data from those packages to go through the book. I also like that Walmart data set to test out forecasting models that can build forecasts by grouping variables so I can see if there is merit to running a single model versus generating a bunch of models for each grouping level.
In terms of the error, the fix was to include a simple tryCatch around an if statement in one of the sub functions that gets called. Apparently, NextGrid doesn't always exist when it's being referenced.
Define lambda----
if(run != 1L) { tryCatch({if(NextGrid$BoxCox[1L] == "skip") { lambda <- NULL } else { lambda <- "auto" }}, error = function(x) lambda <- NULL)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Issued the following commands in order:
Data attached
weekly_log_returns.xlsx