Prototype a spatially explicit forecast (e.g. MODIS leaf-area index?)

cboettig commented 1 year ago

Forecasts in the EFI NEON challenge are all explicitly site-based timeseries.

With the widespread availability of remote sensing & imagery data and the importance of understanding spatially explicit processes is so much of ecological modeling, it would be great to have an example of a spatially explicit forecasting challenge. Developing a spatially explicit forecast will probably raise unique issues as well -- from data formats (cloud-optimized geotiff?) to questions about spatial resolution, benchmark models, developing probabilistic scoring methods to addressing computational and performance issues. (depending on interest, these could give rise to multiple workshop teams).

A few potential starting points:

Define a challenge around prediction of MODIS leaf-area index (LAI). LAI product is produced every 8-days at 500m resolution; and is available in COG+STAC format from planetarycomputer. Might provide a spatial analog of the existing phenology challenge, and maybe relate to terrestrial challenge too (perhaps forecasts from one challenge might be used in the other?)
DrivenData SnowCast Forecasting competition provides an example of a recent spatial forecast challenge that involved iterative submissions, and includes complete code from all contesting models.
NOAA's GEFS forecast is of course explicitly spatial already, and could be a good benchmark case for a team looking to explore what it takes to visualize and score an existing probablistic spatial forecast. Existing tooling used in the EFI gef4cast package may be useful.

...

Lots of other related possibilities. Please comment below if any of these sound interesting and/or propose other examples of issues in spatial forecasting we might dive into with some code! :globe_with_meridians: :artificial_satellite: :earth_africa:

mdietze commented 1 year ago

One thing that I think would make an interesting forecasting challenge would be post-disturbance recovery. We could identify different disturbances that occurred in one year or season (e.g. fire, forestry, wind, pests and pathogens), for example using Landfire, Landtrendr, or GLanCE, and then forecast the time-series of recovery of LAI, NDVI, EVI, etc. either on a 35-day scale or, probably more interesting, out a full year using a S2S forecast (e.g. NMME). I could see something like this being updated every 2-4 weeks instead of daily, both because of the computational cost of such a forecast and because I don't think the predictions would change much day-to-day.

johnwilliamsmithjr commented 1 year ago

I am interested in defining a challenge around spatially explicit forecasts of LAI. I agree that this is bound to raise challenges, both on the data storage side and on the computational modeling side. Some of the challenges I am particularly keen on discuss are the range of computationally feasible models, and development of probabilistic scoring.

On the topic of probabilistic scoring in particular: earlier this year I went to a short-course on point process models from Rick Schoenberg, who mentioned that the Earthquake forecasting community has a unified testing framework that they use for quantifying model performance. Though I am not very familiar with the details, looking into their framework / frameworks from other fields might provide ideas and direction. If anyone is interested in reading up on the Earthquake community validation framework before the meeting, the paper that I found is called "Assessment of Point Process Models for Earthquake Forecasting"!

noamross commented 1 year ago

Are there other point-process spatial forecasts of interest to the NEON/EFI community? A common type for us is spatially explicit forecasts of disease outbreak probability.

mdietze commented 1 year ago

@noamross I think there's substantial interest in making a lot of forecasts spatial. In terms of a forecasting challenge, the relevant question tends to be: what data are available to score such a forecast that is both spatially-extensive and low-latency? Also, if we're going to link to NEON, we'd want to focus on tick- or mosquito-borne disease since those are the systems where they collect field data on the vectors. Could be interesting to try and link the existing tick forecasts to a Lyme forecast -- what's the spatial and temporal resolution of CSC Lyme data? If we're talking weekly and county-level it could be interesting (versus, for example, annual, state-level data, which I don't think would garner much enthusiasm)

noamross commented 1 year ago

@mdietze I meant specifically if there were point processes (discrete events recorded at points in continuous locations and times, rather than gridded counts/measures) collected or used within EFI/NEON. I'd be interested in working on something that has point-process properties like the earthquakes that @johnwilliamsmithjr mentioned, disease or not.

I think most of the things we work in wouldn't be good for this project: we have to deal with data sets that might be relatively quickly reported at exact locations and times, but are quite sparse (e.g., many livestock disease outbreaks occur only during certain seasons, and not every year). So we build models on long-term data sets but continuous forecast testing has long lags. CDC Lyme data are reported at county and weekly levels, but not both, and not released regularly (you can get weekly by state). Most public human or agricultural disease data sets of this sort won't report single events in any case for privacy purposes.

brettmelbourne commented 1 year ago

Narrowing an LAI challenge to post-fire recovery, restricted to recent megafires (say last 5 years) could be interesting and relevant. It might allow models to be more focused on a subset of ecological processes while remaining spatial and using historical data from the global set as training. There's probably existing models already.

trashbirdecology commented 1 year ago

As discussed briefly with Quinn et al. during GRC Pred. Ecol. conference, there is potential to adapt this for the NASA Space Apps Challenge. Putting some links here for posterity:

judging guide https://sa-2019.s3.amazonaws.com/media/documents/Space_Apps_2022_Judging_and_Awards_Guide.pdf
space apps landing page https://www.spaceappschallenge.org/
2022 winners list https://www.earthdata.nasa.gov/news/2022-space-apps-challenge-winners
2022 challenges https://2022.spaceappschallenge.org/challenges/

cboettig commented 1 year ago

Based on this thread so far, it sounds like designing a forecast challenge around predicting MODIS LAI in polygons following large disturbance events (probably wildfire events, though some insect or pathogen-driven defoliation might be a future extension engaging some different ecology).

I think this definition encompasses a spatial areas large enough for spatial heterogeneity to be relevant ( i.e. best forecasts must predict some spatial structure, not just win with every pixel being the spatial average value in an essentially temporal prediction).

We would want to identify events over geographically diverse region where we get very different times of recovery. I think a reasonable scope for us in the unconf would be to prototype what a challenge 'target' would look like:

what disturbance polygons should be in the challenge? Where and how many?
Can we build a pipeline that populates both the historical space-time datacubes of LAI for these polygons (probably going back before the disturbance), and continues to populate these cubes in the future (i.e. every 16 days as we get new MODIS snapshots?)

If we can create these target 'cubes', I think the next step would be defining a simple reference benchmark as a comparison forecast.

I did a quick stab at grabbing some polygons for some recent large CA fires here: https://github.com/cboettig/modis-lai-forecast/blob/main/distrubance.qmd but nearly 3 years later the burn area of the August Complex (CA's first gigafire) I picked looks pretty black on LAI metrics.... I think fires in other areas though may show much faster recover rates so teams don't have to wait years to predict something interesting. Suggestions for sites very welcome!

rqthomas commented 1 year ago

Some additional considerations:

the design of the "targets" file for teams to use to build models and for us to score forecasts. We currently use flat table parquet for the multi-site time series but this seems to require a geospatial-focused file format that is also cloud friendly.
Think about what would go on the theme description page (e.g., https://projects.ecoforecast.org/neon4cast-docs/Terrestrial.html)

cboettig commented 1 year ago

Just connecting dots:

@emmamendelsohn @johnwilliamsmithjr @ddurden and I worked on this theme during the unconference (with input from others as well), current progress can be found in https://github.com/eco4cast/modis-lai-forecast. A minimal product workflow for doing targets, forecasts, and scores can be seen in the rendered quarto notebook

eco4cast / unconf-2023

Prototype a spatially explicit forecast (e.g. MODIS leaf-area index?) #1