Open gmahjub opened 6 years ago
Can you provide a reproducible example?
@mdancho84 I see what @gmahjub is saying. We intentionally turn OHLC = FALSE
when to.period
is used in tq_mutate
. There were a few reasons for this, I think one of them was that we wanted to.period
to be useful with more than just financial price data, but unfortunately this removes the intended behavior when you actually have financial prices.
@gmahjub, if you are comfortable with experimental repos, I have been working on a tidyfinance
package that includes a helper for this exact situation. It probably won't be CRAN ready for awhile though.
https://github.com/DavisVaughan/tidyfinance/blob/master/R/summarise_ohlc.R
I believe this example shows the issue.
tbl <- tq_get("SPY", from = "2018-03-01", to = "2018-03-31") good <- timetk::tk_xts(tbl) %>% to.period() bug <- tbl %>% tq_transmute(mutate_fun = to.period, period = "months")
open should be open price on first date, close should be closing price on last date of period
I believe this example shows the issue.
tbl <- tq_get("SPY", from = "2018-03-01", to = "2018-03-31") good <- timetk::tk_xts(tbl) %>% to.period() bug <- tbl %>% tq_transmute(mutate_fun = to.period, period = "months")
open should be open price on first date, close should be closing price on last date of period
Thanks @randalleclayton , I tried to refer to https://github.com/business-science/tidyquant/issues/30#issuecomment-292324846 and https://github.com/tidyverse/dplyr/issues/2120#issuecomment-246218848 to tidy the data, your codes is works for me.
> # Load tidyquant
> suppressPackageStartupMessages(library(tidyquant))
>
> # Grab some non-OHLC data
> oil_prices <- tq_get("DCOILWTICO", get = "economic.data")
> oil_prices %>% head
# A tibble: 6 x 2
date price
<date> <dbl>
1 2008-01-01 NA
2 2008-01-02 99.6
3 2008-01-03 99.2
4 2008-01-04 97.9
5 2008-01-07 95.1
6 2008-01-08 96.4
>
> ## Convert to OHLC data
> oil_prices %>% tk_xts(.) %>% to.period(period = 'weeks') %>%
+ tk_tbl() %>% dplyr::rename_all(str_replace_all, '\\.', '') %>%
+ dplyr::rename_all(tolower)
Using column `date` for date_var.
# A tibble: 561 x 5
index open high low close
<date> <dbl> <dbl> <dbl> <dbl>
1 2008-01-04 99.6 99.6 97.9 97.9
2 2008-01-11 95.1 96.4 92.7 92.7
3 2008-01-18 94.2 94.2 90.1 90.6
4 2008-01-25 89.6 90.4 87.6 90.4
5 2008-02-01 91.0 92.3 89.0 89.0
6 2008-02-08 90.1 91.8 87.2 91.8
7 2008-02-15 93.6 95.6 92.8 95.6
8 2008-02-22 100.0 101. 98.6 99.0
9 2008-02-29 99.4 103. 99.4 102.
10 2008-03-07 102. 106. 99.7 105.
# ... with 551 more rows
Warning messages:
1: In tk_xts_.data.frame(data = data, select = select, date_var = date_var, :
Non-numeric columns being dropped: date
2: In to.period(., period = "weeks") : missing values removed from data
I noticed that there is possibly a bug when using tq_transmute() or tq_mutate() on data retrieved using tq_get() and the "stock.prices" option. When I retrieve the daily data, and I would like to convert this data to weekly periodicity for example, the returned result basically loses all the information in the data except that information from the endpoints. This is not the way that this functionality works in xts (to.period). In xts, the OHLC is aggregated, meaning the high of the week, the low of the week, the open of the week, and the close of the week are returned as new data, weekly periodicity. It appears that there should be some way to aggregate using cummin(), cummax() for the week, while simply using the week's open and the week's close for the open and close data. Simply returning the OHLC of the endpoints for the new periodicity when OHLC data is at hand seems like a waste of important information. It is possible that I am missing something here and would love to be enlightened.