afsc-gap-products / akgfmaps

Make AFSC bottom trawl survey maps and retrieve map layers
Other
17 stars 6 forks source link

add set_breaks function to akgfmaps package #42

Closed EmilyMarkowitz-NOAA closed 8 months ago

EmilyMarkowitz-NOAA commented 8 months ago

Suggestion

This is the function I use in the Bering Sea data report to set neat, rounded, set breaks for distribution plots created using the akgfmaps R package. I thought it could be useful to this package and to users of this package, and it would be best to add it this package. The function is below. Modify and improve as you see fit!

Function

#' Set scale breaks for a variable
#' 
#' @description
#' This function will automatically choose neatly binned scale bins for a given vector of variables. 
#'
#' @param dat The data.frame from which `var` is a column that needs to be binned. 
#' @param var A column in the data.frame `dat` that needs to be binned for a scale bar. 
#' @param n The number of bins to return (other than including 0). 
#'
#' @return A vector. 
#' @export
#'
#' @examples
#' set_breaks(dat = akgfmaps::YFS2017, var = "CPUE_KGHA")
#' # compare with unrounded default: 
#' dat = akgfmaps::YFS2017; var = "CPUE_KGHA"
#' a <- classInt::classIntervals(var = as.numeric(unlist(dat[,var]))[as.numeric(unlist(dat[,var])) != 0], 
#'                          n = 5, style = "jenks")$brks
#' a
#' round(a)
set_breaks <- function(dat, var, n = 5) {
  set.breaks0 <- classInt::classIntervals(var = as.numeric(unlist(dat[,var]))[as.numeric(unlist(dat[,var])) != 0], 
                                          n = n, style = "jenks")$brks
  set.breaks <- c()

  for (i in 1:length(set.breaks0)) {

    if (i == length(set.breaks0)) {
      set.breaks<-c(set.breaks, ceiling(x = set.breaks0[i])) #Inf)# #round(set.breaks0[i], digits = 0))
    } else if (i == 1) {
      set.breaks<-c(set.breaks, 0)
    } else {    
      set.breaks <- c(set.breaks, 
                      plyr::round_any(x = set.breaks0[i], 
                                      accuracy = ifelse(max(set.breaks0[i])>300, 100, 
                                                        ifelse(max(set.breaks0[i])>100, 50, 
                                                               ifelse(max(set.breaks0[i])>20, 10, 
                                                                      1))), 
                                      f = ceiling))    
    }
  }
  set.breaks <- unique(set.breaks)

  return(set.breaks)
}

Comparison

As shown in the function example, using this package will produce:

set_breaks(dat = akgfmaps::YFS2017, var = "CPUE_KGHA")
# [1]    0   60  200  400  800 1207

which should be compared with the below, which is fine, but messier:

dat = akgfmaps::YFS2017; var = "CPUE_KGHA"
a <- classInt::classIntervals(var = as.numeric(unlist(dat[,var]))[as.numeric(unlist(dat[,var])) != 0], 
                         n = 5, style = "jenks")$brks
a
# [1]    0.0377   56.0109  153.3248  353.3084  736.8894 1206.3613
round(a)
# [1]    0   56  153  353  737 1206
sean-rohan-NOAA commented 8 months ago

Hi Em,

Something similar to this is already implemented in the package by using the "pretty" method:

library(akgfmaps)

ebs_layers <- akgfmaps::get_base_layers(select.region = "ebs", 
                                        set.crs = "EPSG:3338")

sample_locations <- sf::st_as_sf(akgfmaps::YFS2017,
                                 coords = c("LONGITUDE", "LATITUDE"),
                                 crs = "WGS84") |>
  sf::st_transform(crs = "EPSG:3338")

idw <- make_idw_stack(x = dplyr::bind_rows(akgfmaps::YFS2017,
                                           dplyr::mutate(akgfmaps::YFS2017, YEAR = 999)), 
                      grouping.vars = "YEAR",
                      out.crs = "EPSG:3338",
                      region = "ebs",
                      set.breaks = "pretty",
                      extrapolation.grid.type = "sf")

ggplot() +
  geom_sf(data = idw$extrapolation.stack,
          mapping = aes(fill = var1.pred),
          color = NA) +
  geom_sf(data = sample_locations) +
  scale_fill_viridis_d() +
  facet_wrap(~YEAR)

However, I would generally recommend avoiding automatic break selection and, [instead, set up] breaks based on information about the density distribution of the species because there really isn't a an algorithm for selecting breaks that works well for all species. The eval_plot_breaks() function is supposed to help guide the selection of breaks:

akgfmaps::eval_plot_breaks(akgfmaps::YFS2017$CPUE_KGHA, n.breaks = 5)
EmilyMarkowitz-NOAA commented 8 months ago

Oh good to know, I wasn't aware of this function. Thanks for this! Considering the parallel discussion in https://github.com/afsc-gap-products/akgfmaps/issues/40, I'll close this issue, too.