2DegreesInvesting / tiltIndicator

Implement the core business logic of the tilt indicators
https://2degreesinvesting.github.io/tiltIndicator/
GNU General Public License v3.0
1 stars 1 forks source link

`summarize_range()` should handle `NA`s before passing data to `min()/max()` #746

Closed maurolepore closed 5 months ago

maurolepore commented 6 months ago

Relates to #743 Relates to https://github.com/2DegreesInvesting/tiltIndicatorAfter/issues/188

library(tiltIndicator)
library(dplyr, warn.conflicts = FALSE)

# The warning comes when all values are NA 
min(c(NA, NA), na.rm = TRUE)
#> Warning in min(c(NA, NA), na.rm = TRUE): no non-missing arguments to min;
#> returning Inf
#> [1] Inf

# This happens, e.g. when feeding `summarize_range()` with a group full of NA
data <- tibble(x = c(1, NA), group = c(1, 2))
data
#> # A tibble: 2 × 2
#>       x group
#>   <dbl> <dbl>
#> 1     1     1
#> 2    NA     2

data |> summarize_range(x, na.rm = TRUE, .by = group)
#> Warning: There were 2 warnings in `summarize()`.
#> The first warning was:
#> ℹ In argument: `min = min(x, na.rm = na.rm)`.
#> ℹ In group 2: `group = 2`.
#> Caused by warning in `min()`:
#> ! no non-missing arguments to min; returning Inf
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> # A tibble: 2 × 3
#>   group   min   max
#>   <dbl> <dbl> <dbl>
#> 1     1     1     1
#> 2     2   Inf  -Inf

# Excluding the missing values before solves the problem
var <- "x"
n_na <- data |> filter(is.na(data[[var]])) |> nrow()
rlang::warn(glue::glue("Removing {n_na} rows where `{var}` is `NA`."))
#> Warning: Removing 1 rows where `x` is `NA`.

complete <- data |> filter(!is.na(data[[var]]))
complete |> summarize_range(x, na.rm = TRUE, .by = group)
#> # A tibble: 1 × 3
#>   group   min   max
#>   <dbl> <dbl> <dbl>
#> 1     1     1     1

Created on 2024-03-13 with reprex v2.1.0

AnneSchoenauer commented 6 months ago

Okay nice that means you found the bug? i think we only would use the jitter function on the Ecoinvent data anyway - so I think it is right to delete the NAs as long as for the final output all products are included again also those that don’t have a co2 footprint.

maurolepore commented 6 months ago

Yes, I now understand bug. There are two places with the same type of problem. One is when we summarize the range of co2_footprint and the other one is when we calculate the co2_avg.

Okay, I'll remove the NAs during those two computations and ensure to re-add them later (which seems to be the case already).

maurolepore commented 5 months ago

Closing because #743 already gives the user the ability to remove NAs with na.rm = TRUE. So from the perspective of tiltIndicator, this is solved. Now it's up to downstream packages to use this feature.

See also the relevant items in the changelog: