anhoej / qicharts2

R package: Quality improvement charts
38 stars 12 forks source link

Using tibbletime for date manipulation #11

Closed pwildenhain closed 6 years ago

pwildenhain commented 6 years ago

We recently released an internal R package for my Quality Improvement team that heavily relies on qic() for it's spc chart functionality. In doing this we added something that I wanted to pitch to you.

We work almost exclusively with logitudinal analyses. To save our team from messing with dplyr::mutate() and lubridate, we decided to utilize functions from the tibbletime package to abstract away the process of date manipulation. Here's a vignette thats shows how well tibbletime works with dplyr.

The end product looks something like this:

qic(data = data, x = date_column, y = metric_column, n = n, period = "monthly")

Where the period argument does the date manipulation on x for you.

If you like this idea I'd be happy to dive more into the specifics and figure out if/how this best fits into your existing API.

anhoej commented 6 years ago

Hi Paul

This is a very good idea. I guess it could all be done using base R functions like cut(). It might take some thought and effort to make the period argument fool-proof to avoid problems when the x-variable is not a datatime object.

Another consideration is how far qicharts2 should go to help do actual data manipulation. I have a feeling that I have already stretched it a bit too far with automatic aggregation of subgroup data and "elegant" handling of missing values. This is clearly helpful for the everyday use of qicharts but may potentially cause problems when the user is not aware of what is going on under the hood. For example, if you have subgroups > 1, qic() automatically calculates the mean of each subgroup. But sometimes you want the sum, and if you are not on your toes and specify this using the agg.fun argument, you might not get what you want and you might not even notice. The purist approach would be to mandate the user to prepare and clean data before even considering putting them on an SPC chart.

But again, I think this is a very good idea and I'll look into it.

Kind regard Jacob

2018-05-08 3:53 GMT+02:00 pwildenhain notifications@github.com:

We recently released an internal R package for my Quality Improvement team that heavily relies on qic() for it's spc chart functionality. In doing this we added something that I wanted to pitch to you.

We work almost exclusively with logitudinal analyses. To save our team from messing with dplyr::mutate() and lubridate, we decided to utilize functions from the tibbletime package to abstract away the process of date manipulation. Here's a vignette https://business-science.github.io/tibbletime/articles/TT-04-use-with-dplyr.html thats shows how well tibbletime works with dplyr.

The end product looks something like this:

qic(data = data, x = date_column, y = metric_column, n = n, period = "monthly")

Where the period argument does the date manipulation on x for you.

If you like this idea I'd be happy to dive more into the specifics and figure out if/how this best fits into your existing API.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/anhoej/qicharts2/issues/11, or mute the thread https://github.com/notifications/unsubscribe-auth/AEQ_xEQ0fessAbbvjdqzS4PP9a3KxFjaks5twPqwgaJpZM4T15Zl .

-- Venlig hilsen Jacob Anhøj

pwildenhain commented 6 years ago

I'm sure that's a challenging balance to achieve; how much to do in qic() as opposed to user pre-processing.

I think something that tibbletime offers over the base R approach (i.e cut()) is the flexibility in defining your time period. For example, with their collapse_by() function you could input month, monthly, quarterly, 12 weeks, etc. to create the time period for aggregation. I can also understand that you might be hesitant to add a package dependency.

Based off the package API , I would propose period = NULL as the default, and then executing a helper function right after data frame prep, but before aggregation:

  # Prepare data frame
  d <- data.frame(x, y, n, notes, facets, cl, target)
  d <- droplevels(d)

  # Date Manipulation
  if (!missing(period)) {
      d <- date_helper(d, period)
  }

  # Aggregate data and perform analyses
  d <- qic.agg(d, got.n, part, agg.fun, freeze, exclude, 
          chart.fun, multiply, dots.only, chart, y.neg)

where date_helper() mutates x according to period (this is similar to how we handled this in our enterprise R package).

Thanks for hearing me out, looking forward to your decision.

anhoej commented 6 years ago

Thanks again. Will consider.

anhoej commented 6 years ago

Please check the latest dev version.

Example:

d <- data.frame(x = seq(Sys.Date(), length.out = 365, by ='day'),
                y = rnorm(365))

qic(x, y, data = d)
qic(x, y, data = d, x.period = 'week')
qic(x, y, data = d, x.period = '2 weeks')
qic(x, y, data = d, x.period = 'month')
qic(x, y, data = d, x.period = 'quarter')
pwildenhain commented 6 years ago

Wow that was lighting fast! I had no idea that cut() was that flexible, thats awesome.

I installed and tested it on some of our data and it worked really well, thanks for adding this :1st_place_medal:

anhoej commented 6 years ago

Thank you for the idea. Yes, cut() is really the Swiss knife of datetime manipulation. Only thing to remember is to convert the output back to datetime. For the same reason, I never really needed lubridate and its descendants.

Closing this. Keep the good ideas coming.