business-science / tidyquant

Bringing financial analysis to the tidyverse
https://business-science.github.io/tidyquant/
Other
852 stars 175 forks source link

tq_mutate and tq_transmute functionality when going down in periodicity on OHLC data #92

Open gmahjub opened 6 years ago

gmahjub commented 6 years ago

I noticed that there is possibly a bug when using tq_transmute() or tq_mutate() on data retrieved using tq_get() and the "stock.prices" option. When I retrieve the daily data, and I would like to convert this data to weekly periodicity for example, the returned result basically loses all the information in the data except that information from the endpoints. This is not the way that this functionality works in xts (to.period). In xts, the OHLC is aggregated, meaning the high of the week, the low of the week, the open of the week, and the close of the week are returned as new data, weekly periodicity. It appears that there should be some way to aggregate using cummin(), cummax() for the week, while simply using the week's open and the week's close for the open and close data. Simply returning the OHLC of the endpoints for the new periodicity when OHLC data is at hand seems like a waste of important information. It is possible that I am missing something here and would love to be enlightened.

mdancho84 commented 6 years ago

Can you provide a reproducible example?

DavisVaughan commented 6 years ago

@mdancho84 I see what @gmahjub is saying. We intentionally turn OHLC = FALSE when to.period is used in tq_mutate. There were a few reasons for this, I think one of them was that we wanted to.period to be useful with more than just financial price data, but unfortunately this removes the intended behavior when you actually have financial prices.

@gmahjub, if you are comfortable with experimental repos, I have been working on a tidyfinance package that includes a helper for this exact situation. It probably won't be CRAN ready for awhile though.

https://github.com/DavisVaughan/tidyfinance/blob/master/R/summarise_ohlc.R

randalleclayton commented 6 years ago

I believe this example shows the issue.

tbl <- tq_get("SPY", from = "2018-03-01", to = "2018-03-31") good <- timetk::tk_xts(tbl) %>% to.period() bug <- tbl %>% tq_transmute(mutate_fun = to.period, period = "months")

open should be open price on first date, close should be closing price on last date of period

englianhu commented 6 years ago

I believe this example shows the issue.

tbl <- tq_get("SPY", from = "2018-03-01", to = "2018-03-31") good <- timetk::tk_xts(tbl) %>% to.period() bug <- tbl %>% tq_transmute(mutate_fun = to.period, period = "months")

open should be open price on first date, close should be closing price on last date of period

Thanks @randalleclayton , I tried to refer to https://github.com/business-science/tidyquant/issues/30#issuecomment-292324846 and https://github.com/tidyverse/dplyr/issues/2120#issuecomment-246218848 to tidy the data, your codes is works for me.

> # Load tidyquant
> suppressPackageStartupMessages(library(tidyquant))
> 
> # Grab some non-OHLC data
> oil_prices <- tq_get("DCOILWTICO", get = "economic.data")
> oil_prices %>% head
# A tibble: 6 x 2
  date       price
  <date>     <dbl>
1 2008-01-01  NA  
2 2008-01-02  99.6
3 2008-01-03  99.2
4 2008-01-04  97.9
5 2008-01-07  95.1
6 2008-01-08  96.4
> 
> ## Convert to OHLC data
> oil_prices %>% tk_xts(.) %>% to.period(period = 'weeks') %>% 
+     tk_tbl() %>% dplyr::rename_all(str_replace_all, '\\.', '') %>% 
+     dplyr::rename_all(tolower)
Using column `date` for date_var.
# A tibble: 561 x 5
   index       open  high   low close
   <date>     <dbl> <dbl> <dbl> <dbl>
 1 2008-01-04  99.6  99.6  97.9  97.9
 2 2008-01-11  95.1  96.4  92.7  92.7
 3 2008-01-18  94.2  94.2  90.1  90.6
 4 2008-01-25  89.6  90.4  87.6  90.4
 5 2008-02-01  91.0  92.3  89.0  89.0
 6 2008-02-08  90.1  91.8  87.2  91.8
 7 2008-02-15  93.6  95.6  92.8  95.6
 8 2008-02-22 100.0 101.   98.6  99.0
 9 2008-02-29  99.4 103.   99.4 102. 
10 2008-03-07 102.  106.   99.7 105. 
# ... with 551 more rows
Warning messages:
1: In tk_xts_.data.frame(data = data, select = select, date_var = date_var,  :
  Non-numeric columns being dropped: date
2: In to.period(., period = "weeks") : missing values removed from data