bcgov / wqbench

R package to generate download and compile data from EPA ECOTOX database
Apache License 2.0
3 stars 2 forks source link

step 2: classify duration - how to deal with the not specified values #9

Closed aylapear closed 1 year ago

aylapear commented 1 year ago

The chronic and acute rules have gaps which causes many values to be coded as not specified.

Currently these values are being filtered out.

It has been mentioned that these should be classified as acute instead of being remove but waiting on confirmation of this.

aylapear commented 1 year ago
  data_classified <- data |>
    dplyr::mutate(
      duration_class = dplyr::case_when(
        # Fish and Amphibians
        stringr::str_detect(ecological_group, "(?i)^amphibian$|^fish$")  & obs_duration_mean_std <= 96 ~ "acute",
        stringr::str_detect(ecological_group, "(?i)^amphibian$|^fish$") & stringr::str_detect(simple_lifestage, "(?i)^juvenile$|(?i)^adult$")  & obs_duration_mean_std >= 504 ~ "chronic",
        stringr::str_detect(ecological_group, "(?i)^amphibian$|^fish$") & stringr::str_detect(simple_lifestage, "(?i)^els$")  & obs_duration_mean_std >= 168 ~ "chronic",
        # Invertebrates
        stringr::str_detect(ecological_group, "(?i)^invertebrate$") & obs_duration_mean_std <= 96 ~ "acute",
        stringr::str_detect(ecological_group, "(?i)^invertebrate$") & stringr::str_detect(ecological_group_class, "(?i)Planktonic Invertebrate") & obs_duration_mean_std > 96 ~ "chronic",
        stringr::str_detect(ecological_group, "(?i)^invertebrate$") & stringr::str_detect(ecological_group_class, "(?i)Regular") & obs_duration_mean_std >= 168 ~ "chronic",
        # Algae
        stringr::str_detect(ecological_group, "(?i)^algae$") & obs_duration_mean_std <= 24 ~ "acute",
        stringr::str_detect(ecological_group, "(?i)^algae$") & obs_duration_mean_std > 24 ~ "chronic",
        # Plants
        stringr::str_detect(ecological_group, "(?i)^plant$") & obs_duration_mean_std <= 48 ~ "acute",
        stringr::str_detect(ecological_group, "(?i)^plant$") & obs_duration_mean_std > 168 ~ "chronic",
        TRUE ~ "not specified"
      ) 
    ) |>
    dplyr::filter(!(duration_class == "not specified"))

The last line can be switched to have things turn to acute if they don't meet any of the other rules.

aylapear commented 1 year ago

They have been set to acute and tested