Open badgley opened 2 years ago
@badgley - thanks for opening this issue. Given that data comes in per-state granules, what do you think the final output dataset(s) look like? Can we merge the rasters into a single (i.e. CONUS) raster? Or does it make more sense to keep the data in a per-state layout?
definitely like the idea of having conus
and ak
rasters. i can't really think of any analyses we'd be doing where 'state' would be the natural unit of analysis. as i recall, we need to break conus
and ak
apart if we want to use Albers?
The more I'm thinking over this, I wonder if we want to try writing a Pangeo Forge recipe for this task. It could be interesting to test out and give us a natural way to test out something different from the prefect
flows we've been working on lately. unless i hear objections, i'll start fiddling around with that later this week.
looking over the pangeo-forge examples, i don't immediately see a template for doing the 'stitching together step' so could be interesting usecase to flesh out.
we need to break conus and ak apart if we want to use Albers?
I think that is right.
If we're going to go down the Pangeo-Forge route (which I think sounds great), we may want to break this problem into a few steps:
The USFS has generated US wide estimates of burn probability estimates, based on vegetation and wildland fuel data from LANDFIRE 2014. Having these data in their raw resolution (30m) and downsampled resolution (4000m?) would help with on-going work to evaluate permanence risk to forest carbon. These data would also help in evaluating the accuracy of our MTBS fire risk modeling.
To accomplish this task, we need to i) download all the burn probability BP) data ii) stitch it all together in a single file iii) do some downsampling/regridding and iv) save the end product to somewhere we can then access the data.
In the past, we've sort of rolled our own data processing. More recently, we've done a bunch of stuff for the CMIP6 projects with prefect. And separately, I'm aware of on-going efforts for similar data transformations with Pangeo-Forge. It would be helpful to get feedback from @orianac @jhamman and @norlandrhagen about the best way to accomplish the task.
Here are some other details and questions that can get the conversation started.
Data
Input
Raw 30m GeoTIFF are available directly from the USFS Research Data Archive. Data are organized within a zipfile on a per-state basis, with each file containing eight separate data products. We're interested in the
Burn Probability
data. File sizes range from 100MB to 20GB.I think we probably want to separately download these data and archive them on our cloud storage. Thoughts @jhamman ?
Output
Format
Our target output should be either:
We should store the data in two resolutions:
We'll need to handle CONUS and AK, which I think requires separate files? @jhamman
Location
Where should the final zarr/tiff(s) live? I think we've historically started with google cloud storage so I guess we start by pushing the data there.