Improving scallop zone plots

BryceMcManus-NOAA commented 1 year ago

Including zones that have less than 10 or so observations makes the scallop zone plots smaller, cluttered, and more difficult to understand. I think the easiest way to improve them, without changing zone_summary() , is to filter out zones with fewer than 10 (or possibly 5) observations before running zone_summary():

scallop0322MainDataTable %>%
    add_count(Zone_ID) %>%
    filter(n >= 10) %>%
    zone_summary(project = "scallop0322",
                 spat = "scallop0322TenMNSQRSpatTable",
                 zone.dat = "ZoneID",
                 zone.spat = "MN10SQID",
                 output = "tab_plot",
                 breaks = c(10, seq(100, 1e3, 100)),
                 bin_colors = fishset_viridis(11),
                 count = TRUE,
                 na.rm = TRUE)

## $table
## # A tibble: 263 × 2
##    Zone_ID     n
##    <chr>   <int>
##  1 387323   1005
##  2 387313    935
##  3 416956    833
##  4 406962    763
##  5 387332    762
##  6 387341    701
##  7 406923    701
##  8 406961    686
##  9 387322    674
## 10 387463    653
## # … with 253 more rows
## # ℹ Use `print(n = ...)` to see more rows
## 
## $plot

unnamed-chunk-3-1

For the percentage plots, we'll need to add a zone percentage column as well as pass a placeholder function that returns the value it's given. It's a bit hacky but it gets the job done. Also, and this kind of a pain, for some reason the spatial plots can't be edited after they're created, unlike other ggplots. So, for example, you can't add a title or change the legend text, which is why the placeholder function is call %, since zone_summary() uses the name of the function. I haven't found a solution that doesn't involve changing zone_summary().

`%` <- function(x) x

scallop0322MainDataTable %>% 
  count(Zone_ID) %>% 
  mutate(obs = n/sum(n) * 100) %>% 
  filter(n >= 10) %>% 
  zone_summary(project = "scallop0322",
               spat = scallop0322TenMNSQRSpatTable,
               zone.dat = "ZoneID",
               zone.spat = "MN10SQID",
               output = "tab_plot",
               count = FALSE,
               var = "obs",
               fun = "%",
               na.rm = TRUE)

## $table
## # A tibble: 507 × 2
##    Zone_ID     obs
##    <chr>     <dbl>
##  1 357551  0.00655
##  2 367421  0.0191 
##  3 377311  0.0226 
##  4 377332  0.00596
##  5 377333  0.00596
##  6 377422  0.00953
##  7 377423  0.0798 
##  8 377424  0.232  
##  9 377425  0.0602 
## 10 377431  0.0441 
## # … with 497 more rows
## # ℹ Use `print(n = ...)` to see more rows
## 
## $plot

unnamed-chunk-5-1

Here's how the CPUE plot may look after filtering:

scallop0322MainDataTable %>%
    add_count(Zone_ID) %>%
    filter(n >= 10) %>%
    zone_summary(project = "scallop0322",
                 spat = "scallop0322TenMNSQRSpatTable", 
                 zone.dat = "ZoneID",
                 zone.spat = "MN10SQID",
                 count = FALSE,
                 var = "CPUE", 
                 fun = "mean",
                 output = "tab_plot", 
                 breaks = c(seq(1e3, 4e3, 500), seq(6e3, 14e3, 2e3)),
                 na.rm = TRUE)

## $table
## # A tibble: 263 × 2
##    Zone_ID  CPUE
##    <chr>   <dbl>
##  1 377311  1728.
##  2 377422  2060.
##  3 377423  2143.
##  4 377424  1954.
##  5 377425  1932.
##  6 377431  1842.
##  7 377432  1890.
##  8 377433  2195.
##  9 377434  1799.
## 10 377441  1871.
## # … with 253 more rows
## # ℹ Use `print(n = ...)` to see more rows
## 
## $plot

unnamed-chunk-6-1

@mle2718 does this sound good to you?

mle2718 commented 1 year ago

@BryceMcManus-NOAA : Looks good to me.

The mutate line here:

  count(Zone_ID) %>% 
  mutate(obs = n/sum(n) * 100) %>% 
  filter(n >= 10) %>%

This so the percentages are done on all the data in scallop0322MainDataTable, not the filtered data, right? If so, then I like it.

Heads up 1: We may need to adjust the plots further based on reviewer/editor comments. For example, we may be asked to write out "observations" or change "CPUE (mean)" to "Average CPUE." Or not. The other thing I could see is being being asked to move the legend inside the figure -- we have alot of white space in the bottom right of the map.

Heads up 2: I think this is on your list, but fiddling with the colorization might be needed. I think we could do something like:

custom_breaks<-classInt::classify_intervals(scallop0322MainDataTable $CPUE, 5, style="jenks") #or classInt:classIntervals()

And then pass this to the zone_summary(breaks=custom_breaks). This would not work for the maps of obs or percentages, I don't think.

BryceMcManus-NOAA commented 1 year ago

The mutate line here:
  count(Zone_ID) %>% 
  mutate(obs = n/sum(n) * 100) %>% 
  filter(n >= 10) %>% 
This so the percentages are done on all the data in scallop0322MainDataTable, not the filtered data, right? If so, then I like it.

Right, the percentages are calculated before the filter. Then zone_summary() simply uses those pre-calculated values in the plot.

Heads up 1: We may need to adjust the plots further based on reviewer/editor comments. For example, we may be asked to write out "observations" or change "CPUE (mean)" to "Average CPUE." Or not. The other thing I could see is being being asked to move the legend inside the figure -- we have alot of white space in the bottom right of the map.

Those changes will require me to update zone_summary(). I'll have to add a legend title argument, legend position etc. which is pretty straightforward.

Heads up 2: I think this is on your list, but fiddling with the colorization might be needed. I think we could do something like:
custom_breaks<-classInt::classify_intervals(scallop0322MainDataTable $CPUE, 5, style="jenks") #or classInt:classIntervals()
And then pass this to the zone_summary(breaks=custom_breaks). This would not work for the maps of obs or percentages, I don't think.

This is bit tricky since the breaks are being determined before the average CPUE is calculated in your example, so the breaks will be way off. And classify_interval() returns a character interval which will return an error since the breaks argument in zone_summary() requires a numeric vector. To make this work you'd have to do this:

interval_to_numeric <- function(int) {

  int <- gsub("\\[|\\(|\\)|\\]", "", int) # remove brackets and parentheses
  int <- strsplit(int, ",")               # split by comma
  int <- as.numeric(unlist(int))          # convert to numeric vector

  int[!duplicated(int)]                   # remove duplicated breaks
}

custom_breaks <- function(var, brks = 5, style = "fisher", round = 0) {

  int <- classInt::classify_intervals(var, n = brks, style = style)
  int <- levels(int)

  round(interval_to_numeric(int))
}

scallop0322MainDataTable %>%
  add_count(Zone_ID) %>%
  filter(n >= 10) %>%
  group_by(ZoneID) %>% 
  mutate(mean_cpue = mean(CPUE)) %>% 
  ungroup() %>% 
  zone_summary(project = "scallop0322",
               spat = "scallop0322TenMNSQRSpatTable",
               zone.dat = "Zone_ID",
               zone.spat = "MN10SQID",
               output = "tab_plot",
               var = "CPUE",
               fun = "mean",
               breaks = custom_breaks(.$mean_cpue, 5, "fisher"),
               count = FALSE,
               na.rm = TRUE)

## $table
## # A tibble: 263 × 2
##    Zone_ID  CPUE
##    <chr>   <dbl>
##  1 377311  1728.
##  2 377422  2060.
##  3 377423  2143.
##  4 377424  1954.
##  5 377425  1932.
##  6 377431  1842.
##  7 377432  1890.
##  8 377433  2195.
##  9 377434  1799.
## 10 377441  1871.
## # … with 253 more rows
## # ℹ Use `print(n = ...)` to see more rows
## 
## $plot

unnamed-chunk-7-1

mle2718 commented 1 year ago

This is bit tricky since the breaks are being determined before the average CPUE is calculated in your example, so the breaks will be way off. And classify_interval() returns a character interval which will return an error since the breaks argument in zone_summary() requires a numeric vector. To make this work you'd have to do this:

oh ...right.... Thanks for digging into this.

mle2718 commented 1 year ago

@BryceMcManus-NOAA let's shelve the breaks for now. The filter (or mutate %>% filter) solution should work.

This will eventually got into Dev -- but don't code it up yet, Dev is changing rapidly and I'm not confident it my ability to resolve merge conflicts.

mle2718 commented 1 year ago

@BryceMcManus-NOAA -- dev now works and is fairly stable. That happened faster than I expected. If you have improvements ready, please go for it. If not, no big deal.

BryceMcManus-NOAA commented 1 year ago

@mle2718 Great, I can start adjusting the plots tomorrow morning.

mle2718 commented 1 year ago

@BryceMcManus-NOAA : one thing that I was thinking about..just for the "closure areas" is it possible to divide "total catch" by the area (in km^2) of the closure area? If it's easy, that would be a great improvement to our "this is how the call area--> wind area--> lease areas evolve" story.

NEFSC / READ-SSB-CHAJI-Effort-Displacement---Scallop

Improving scallop zone plots #184