badgley commented 2 years ago

The USFS has generated US wide estimates of burn probability estimates, based on vegetation and wildland fuel data from LANDFIRE 2014. Having these data in their raw resolution (30m) and downsampled resolution (4000m?) would help with on-going work to evaluate permanence risk to forest carbon. These data would also help in evaluating the accuracy of our MTBS fire risk modeling.

To accomplish this task, we need to i) download all the burn probability BP) data ii) stitch it all together in a single file iii) do some downsampling/regridding and iv) save the end product to somewhere we can then access the data.

In the past, we've sort of rolled our own data processing. More recently, we've done a bunch of stuff for the CMIP6 projects with prefect. And separately, I'm aware of on-going efforts for similar data transformations with Pangeo-Forge. It would be helpful to get feedback from @orianac @jhamman and @norlandrhagen about the best way to accomplish the task.

Here are some other details and questions that can get the conversation started.

Data

Input

Raw 30m GeoTIFF are available directly from the USFS Research Data Archive. Data are organized within a zipfile on a per-state basis, with each file containing eight separate data products. We're interested in the Burn Probability data. File sizes range from 100MB to 20GB.

I think we probably want to separately download these data and archive them on our cloud storage. Thoughts @jhamman ?

Output

Format

Our target output should be either:

a big GeoTIFF (similar to how we store NLCD data)
a Zarr store (similar to how we store MTBS data)

We should store the data in two resolutions:

native 30m
downsampled 4000m

We'll need to handle CONUS and AK, which I think requires separate files? @jhamman

Location

Where should the final zarr/tiff(s) live? I think we've historically started with google cloud storage so I guess we start by pushing the data there.

jhamman commented 2 years ago

@badgley - thanks for opening this issue. Given that data comes in per-state granules, what do you think the final output dataset(s) look like? Can we merge the rasters into a single (i.e. CONUS) raster? Or does it make more sense to keep the data in a per-state layout?

badgley commented 2 years ago

definitely like the idea of having conus and ak rasters. i can't really think of any analyses we'd be doing where 'state' would be the natural unit of analysis. as i recall, we need to break conus and ak apart if we want to use Albers?

The more I'm thinking over this, I wonder if we want to try writing a Pangeo Forge recipe for this task. It could be interesting to test out and give us a natural way to test out something different from the prefect flows we've been working on lately. unless i hear objections, i'll start fiddling around with that later this week.

looking over the pangeo-forge examples, i don't immediately see a template for doing the 'stitching together step' so could be interesting usecase to flesh out.

jhamman commented 2 years ago

we need to break conus and ak apart if we want to use Albers?

I think that is right.

If we're going to go down the Pangeo-Forge route (which I think sounds great), we may want to break this problem into a few steps:

Pangeo-Forge recipe [zip files -> state zarr rasters]
Pangeo-Forge recipe [zip files -> merged raster] (I don't think there is a template to do this yet but is theoretically possible.
Processing pipeline to regrid merged raster to our preferred Albers projection. For now, I will suggest that we should think of this as something we do outside of pangeo-forge.

carbonplan / data

Add Wildfire Risk to Communities Burn Probability to data catalog #156

Data

Input

Output

Format

Location