An R package to facilitate the calculation of air quality metrics according to Canadian Ambient Air Quality Standards
Inconsistent rounding Issue #66

jeromerobles opened 6 months ago

jeromerobles commented 6 months ago

Round results for rcaaqs::pm_24h_caaqs() is inconsistent depending on the range of data set. In the example below, if data provided is from 2011-2015, the metric_value for 2015 is 28, but when this same data is filtered to 2013-2015, the metric_value for 2015 is 29. Noted that the annual values for each remain the same.

`# -rounding check library(dplyr) library(lubridate)

-create pseudo data

data <- tibble( year = c(2011,2012,2013,2014,2015), value=c(27.8,27.9,32.6,28.5,24.4) ) start_date <- ymd_hm('2011-01-01 00:00') end_date <- ymd_hm('2015-12-31 23:00')

dates <- seq(from = start_date,to = end_date, by='hour') df <- tibble( date_time=dates ) %>% cross_join(tibble(site = 'site1')) %>% mutate(year = year(date_time)) %>% left_join(data)

df1 <- df df2 <- df %>% filter(year>=2013)

test1 <- rcaaqs::pm_24h_caaqs(data = df1) test2 <- rcaaqs::pm_24h_caaqs(data = df2) test1$caaqs test2$caaqs`


A tibble: 5 × 10

caaqs_year min_year max_year n_years metric metric_value caaqs flag_daily_incomplete flag_yearly_incomplete flag_two_of_three_years

1 2011 2011 2011 1 pm2.5_24h NA Insufficient Data NA FALSE FALSE 2 2012 2011 2012 2 pm2.5_24h 28 Not Achieved NA FALSE TRUE 3 2013 2011 2013 3 pm2.5_24h 29 Not Achieved NA FALSE FALSE 4 2014 2012 2014 3 pm2.5_24h 30 Not Achieved NA FALSE FALSE 5 2015 2013 2015 3 pm2.5_24h 28 Not Achieved NA FALSE FALSE > test2$caaqs # A tibble: 3 × 10 caaqs_year min_year max_year n_years metric metric_value caaqs flag_daily_incomplete flag_yearly_incomplete flag_two_of_three_years 1 2013 2013 2013 1 pm2.5_24h NA Insufficient Data NA FALSE FALSE 2 2014 2013 2014 2 pm2.5_24h 31 Not Achieved NA FALSE TRUE 3 2015 2013 2015 3 pm2.5_24h 29 Not Achieved NA FALSE FALSE
stephhazlitt commented 6 months ago

@jeromerobles I cannot seem to replicate the issue.

data <- tibble(
  year = c(2011, 2012, 2013, 2014, 2015),
  value = c(27.8, 27.9, 32.6, 28.5, 24.4)

start_date <- ymd_hm('2011-01-01 00:00')
end_date <- ymd_hm('2015-12-31 23:00')

dates <- seq(from = start_date, to = end_date, by = 'hour')

df <- tibble(date_time = dates) |>
  cross_join(tibble(site = 'site1')) |>
  mutate(year = year(date_time)) |>
df1 <- df
df2 <- df |>
  filter(year >= 2013)

test1 <- rcaaqs::pm_24h_caaqs(data = df1)
testdf1 <- test1[["caaqs"]]
testdf2 <- test2[["caaqs"]]

estimate1 <-
  testdf1 |> filter(caaqs_year == 2015) |> pull(metric_value)
estimate2 <-
  testdf2 |> filter(caaqs_year == 2015) |> pull(metric_value)

setequal(estimate1, estimate2)
Are we using different versions of rcaaqs or anything else under the hood?

> session_info()
