business-science / timetk

Time series analysis in the `tidyverse`
https://business-science.github.io/timetk/
611 stars 97 forks source link

Possibility to include timetk::mutate_by_time() #44

Closed luifrancgom closed 4 years ago

luifrancgom commented 4 years ago

Hello Matt Dancho

In your package timetk your functions filter_by_time() and summarize_by_time are incredible useful! I want to know if maybe in the future you could consider the possibility of including a function timetk::group_by_time() with an argument to group by year, quarter, month, week, day or any other meaningful unit. I know that we can use dplyr::group_by but the problem is that you must separate a column date between its different components to perform dplyr::group_by, adding more lines of code.

Best wishes and thank you for building the packages timetk and modeltime

mdancho84 commented 4 years ago

Hi Luis,

Thanks for your input and I'm really glad you are enjoying the upgraded timetk package. The group_by_time() function is a good idea. Before I begin, I want to make sure that it's actually needed.

Example of Summarize By Time with Groups

summarise_by_time() internally groups by time. If you use group_by() first to group by any groups you want included, it should do what you need. So group_by_time seems unnecessary for summarization / aggregations.

library(tidyverse)
library(timetk)

m4_weekly %>%
    group_by(id) %>%
    summarise_by_time(.by = "month", sum = sum(value))
#> .date_var is missing. Using: date
#> # A tibble: 530 x 3
#> # Groups:   id [4]
#>    id    date         sum
#>    <fct> <date>     <dbl>
#>  1 W10   1999-01-01 1819.
#>  2 W10   1999-02-01 1800.
#>  3 W10   1999-03-01 1800.
#>  4 W10   1999-04-01 2250.
#>  5 W10   1999-05-01 1799.
#>  6 W10   1999-06-01 1800.
#>  7 W10   1999-07-01 2250.
#>  8 W10   1999-08-01 1800.
#>  9 W10   1999-09-01 1800.
#> 10 W10   1999-10-01 2350.
#> # ... with 520 more rows

Mutate By Time

I could see a case for a mutate_by_time() if operations (cumulative or rolling calculations) need to be applied by time-based groups.

Group By Time

I'm struggling to see a case for group_by_time() or a nest_by_time() because this will be covered by summarise_by_time() / mutate_by_time(). We can certainly discuss it though. Maybe it would be more intuitive for people wanting to quickly group or nest by time.

luifrancgom commented 4 years ago

Hello Matt Dancho

Yes you are right! What I wanted was to group_by_time and the use the normal dplyr::mutate to create another variable. So I will change my initial request to timetk::mutate_by_timeif it is possible.

Thank you again!

mdancho84 commented 4 years ago

I've added mutate_by_time() to the development version of timetk. It will go into the next CRAN release of timetk.

You can find a short example in the timetk Function Reference: mutate_by_time().

image

luifrancgom commented 4 years ago

Thank you so much!!!