Change how we are extracting the timeseries data by locations

LimnoDataScience / plume_bloom_drivers

Using classified raster images and meteo drivers to try to better understand what is causing sediment plumes and blooms in Lake Superior

1 stars 1 forks source link

Change how we are extracting the timeseries data by locations #5

Closed lindsayplatt closed 1 year ago

lindsayplatt commented 1 year ago

Currently, we are using centroids for each of the 12 grid cells that were created across the Lake Superior shapefile. What would be more helpful are climate drivers for the watersheds that flow into the lake. Hilary shared watershed shapefiles in slack on 4/18 that can be used. Here is my plan:

Load in the watershed shapefile as a new target (don't commit file yet).
Create 4km gridcells per watershed polygon and use those locations to extract timeseries data
Summarize the gridcell timeseries into one value per variable per day per watershed. (EDIT: using HUCs per gage, not the whole watershed)

lindsayplatt commented 1 year ago

4km grid cells across the watershed result in 1462 grid centers at which to extract PRISM data in p2_prism_plots. Given that the current number of locations is 12 and that took over an hour to process and resulted in ~366k rows in p2_prism_data, I don't think having a 4 km resolution will be feasible ... it could take 100x as long and result in 100x more rows of data.

Here's what those cells look like at 4 km resolution:

library(tidyverse)
library(ggplot2)
ggplot() + 
  geom_sf(data = tar_read(p2_lake_superior_watershed_dissolved)) +
  geom_sf(data = tar_read(p2_lake_superior_watershed_grid), fill=NA) +
  theme(panel.background = element_rect(fill='white')) +
  coord_sf()

lindsayplatt commented 1 year ago

Going to try 10 km grid sizes, which drops the number of cells that intersect the watershed to 270:

lindsayplatt commented 1 year ago

Only going to use those where the centroid of the cell intersect the watershed poly so that we don't get NAs in our extracted PRISM data: