Forecast theme synthesis

OlssonF commented 1 year ago

I'd be interested in working on methods to synthesise the challenge submissions from specific theme, particularly the Aquatics where we have had a big push on submissions in the last 6-9 months.

Thinking about things like:

how to compare forecast skill across models?
how to deal with unequal submissions (not every model has been submitted for the same amount of time/some missed forecasts - what to do about that?)
what interesting questions can we ask about how model performance varies over temporal and spatial scales and forecast variables (temperature vs chlorophyll for example)?

A discussion on how to approach these large theme-level synthesise would be great!

cboettig commented 1 year ago

I think these are great questions, and having answers can help us think about how to answer others. e.g. can we make any generalizations about what is 'easy' or 'hard' to forecast so far, and can we ground those assertions in empirical evidence from the challenge? For example:

Can we say aquatics forecasts (on the timescale set by the challenge) are better & easier than beetle forecasts, though I'm not sure we have the data on the latter to defend the claim.
Can we identify if / when / for which tasks weather covariates are are necessary / most likely to out-perform models without them? Can we tell if / when poor forecast performance in such models can be attributed to poor accuracy in the weather forecast?

maybe these are too specific, but I think lot of questions like these emerge. Thinking about these questions is likely to inform how we go about forecast visualization in the dashboards etc (#2 / #13) and how we think about uncertainty (#9)

OlssonF commented 1 year ago

The first bullet point looks like something the Theory Group are investigating with their model submissions (submitting the same model to all themes (https://github.com/eco4cast/Forecast_submissions). Although at the not sure if we could link up with the other theme with more submissions (Phenology and aquatics).

I'd love to look at your second suggestion though! There is a definite split in the aquatics forecast submissions for whether they use the NOAA weather drivers or not, so this catalogue of forecasts might be a good place to start for this question. Some way to look at how aquatic forecast skill correlates with weather forecast skill. A first step here would be to score the weather forecasts? Do we have observations at sites to do this? Or would we look at using the stage_3 data as a proxy for observations? Have you started this work at all @cboettig ?

Other questions that I think could be interesting to think about, specifically for the aquatics during the unconference would be:

how does predictability vary by site type (rivers/streams/lakes) and location (might need to focus on a subtype here)?
is model rankings are independent of metric? an offline suggestion from @cboettig was also to look at how these differences vary by evaluation metric. When and why is there disagreement between metrics.

Some broader questions on engagement of submitters too as we see a disparity in which sites and variables have been forecasted (skew towards temperature and lakes). What can we learn from this to increase participation in other themes and within specific themes to gain greater breadth.

I think we can make some really good progress on this at the Unconference since we've got plenty of forecasts to work with.

cboettig commented 1 year ago

@OlssonF sounds awesome!

The mechanics for weather scoring are in https://github.com/eco4cast/neon4cast-dashboard/blob/main/noaa.qmd . I think it works but is a bit intensive, so not currently rendering on the dashboard. It uses the neonstore-generated parquet tables to access NEON meteorology by site -- but with NEON now on Google Cloud I'm pretty sure we could re-write this to hit those buckets directly (hey @sokole should we have a mini-breakout theme on that?).

I've actually been wondering if we should simply be storing some pre-computed weather data from NEON like we do for NEON targets. Would potentially be easier to work with (as you know, NEON creates such a summary meterology product but only for a subset of sites). Is it primarily air temperature we're talking about or are their other met vars (precip?) being used in aquatics forecasts?

eco4cast / unconf-2023

Forecast theme synthesis #20