Solve the grouping variable issue

Clare2D commented 4 years ago

Create functionality for more than 2 grouping variables.

The intention behind the grouping_variables is to allow for the code to be run once and to obtain all the necessary groupings of results.

Eg. for PACTA2020: PensionFund/Insurance Swiss/Austrian Investor/Portfolio

This was intended to be achieved through the "grouping variables" in the parameter file. However this has not yet been tested through the code with more than "investor_name" and "portfolio_name".

jdhoffa commented 4 years ago

One approach for this flexibly would be to pass all grouping variables to the ... argument:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(rlang)

group_and_sum_production <- function(data, ...){

  group_dots <- rlang::enquos(...)

  data %>% 
    dplyr::group_by(!!!group_dots) %>% 
    dplyr::summarize(production = sum(.data$production))

}

example_data <- tibble::tribble(
                  ~company_name, ~technology,         ~region, ~production,
                    "company a",       "ice",        "global",         20L,
                    "company b",  "electric",        "europe",          1L,
                    "company b",       "ice",        "europe",         15L,
                    "company c",    "hybrid", "north_america",         10L
                  )

# total production
example_data %>% 
  group_and_sum_production()
#> # A tibble: 1 x 1
#>   production
#>        <int>
#> 1         46

# technology-wise production
example_data %>% 
  group_and_sum_production(technology)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 2
#>   technology production
#>   <chr>           <int>
#> 1 electric            1
#> 2 hybrid             10
#> 3 ice                35

# technology-wise, by region
# technology-wise production
example_data %>% 
  group_and_sum_production(technology, region)
#> `summarise()` regrouping output by 'technology' (override with `.groups` argument)
#> # A tibble: 4 x 3
#> # Groups:   technology [3]
#>   technology region        production
#>   <chr>      <chr>              <int>
#> 1 electric   europe                 1
#> 2 hybrid     north_america         10
#> 3 ice        europe                15
#> 4 ice        global                20

^{Created on 2020-07-28 by the reprex package (v0.3.0)}

You don't need to worry about why the rlang::enquos() and !!! are there for now, you can basically just copy and paste this, and fiddle with it for your use-case.

Clare2D commented 4 years ago

As discussed this morning Jacob. Thanks Jackson. In most cases I've implemented the !!!grouping_variables and something with rlang::syms I believe. The outstanding work required:

the logic to create the groups, and what needs to go into the parameter files to ensure we're getting the right results for all groups - the saving location for these groups needs to change I imagine, or this needs to be decided upon..
Creating the reports for different groups is as far as a got, and the create_interactive_reports code needs to be cleaned to allow for different groups as well.

jdhoffa commented 4 years ago

Cool beans :-)

cjyetman commented 3 years ago

https://tidyeval.tidyverse.org/sec-up-to-speed.html#strings-instead-of-quotes

cjyetman commented 3 years ago

pretty sure this has been resolved, right @Clare2D?

RMI-PACTA / PACTA_analysis

Solve the grouping variable issue #26