USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

added PDP panel plot function and target for paper #191

Closed jds485 closed 1 year ago

jds485 commented 1 year ago

This PR adds a function to make a panel plot of PDPs. The function has several lines that are specific to the attributes that we're plotting in the paper figure. I'm not sure of a good way to generalize, but I'm also not sure we'd need to have it general.

Result: PDP_offset_panel_RF_multiclass_high_NoPhysio_6panel

Noting that running tar_make() to create this plot provides different plot dimensions and sizes: PDP_offset_panel_RF_multiclass_high_NoPhysio

Something that might be good for the paper is to add a rug to these plots to indicate where the data support are located. I do not know how to get the geom_point() to plot and not show in the legend.

Closes #190

cstillwellusgs commented 1 year ago

Try this out, there is an argument for show.legend = FALSE that could probably work.

https://ggplot2.tidyverse.org/reference/geom_rug.html

cstillwellusgs commented 1 year ago

To prevent tar_make() from plotting different dimensions, you can probably specify all font sizes, margin widths, etc. in the theme() and then overall dimensions in ggsave(). That probably isn't necessary for now but we can address it once we need a publication-ready proof from the journal.

jds485 commented 1 year ago

Try this out, there is an argument for show.legend = FALSE that could probably work.

Thanks! That does work, but for now I'm going to omit this edit because I realized that each region could have a different range of values for each attribute and that will take some time to work up into the plot (maybe we place vertical tick marks along each region's line that provide the range of values?). This is the function call that worked for me. The x variable is not the data location and is instead the 25 grid points at which the partial dependence was computed (evenly spaced along each x-axis).

geom_point(mapping = aes(x = variable, y = -0.3), show.legend = FALSE, inherit.aes = FALSE)