Discussion: Ideas on how to use information theory to assess models

amsnyder commented 2 years ago

@galengorski

Using mutual information as a measure of predictive performance and transfer entropy as a measure functional performance across a range of model decisions (could be addition of process guidance, calibration of a parameter)

Using temporal information partitioning (TIPNets) to investigate the amount of information flow from dynamic features to modeled or observed output that is unique, redundant and/or synergistic with another feature. This is another way to assess feature -> target relationships and how a model represents them.

Characterizing critical time scales of influence from feature to target by assessing transfer entropy across a range of time lags. Then comparing time scales across different sites to look at how different site characteristics are associated with process coupling

Below are some more specific comments and examples for each method.

amsnyder commented 2 years ago

@galengorski

1) Ruddell et al 2018 is a good reference, they give the following conceptual diagram:

the x-axis is the difference in transfer entropy between source and target at a given time lag (here 30 minutes), this is the functional performance. The y-axis is 1-mutual information between predicted and observed target variable, this is predictive performance. It is formulated like this so that the origin represents the "ideal" location to be, where predictive performance is perfect and the transfer entropy matches that seen in the observed data. Each model choice produces a point along the curve, so it could be the inclusion of different types of process guidance, or the calibration of a parameter. They give a few conceptual curves, A is what you would ideally see, B represents a tradeoff between predictive and functional performance, C is "getting the right answer for the wrong reasons" and D is getting the "wrong answer for the right reasons".

For our purposes, I think this will be really interesting, although we may not have enough model iterations to create curves like this, I can see for example COAWST model output being a point, or a few points on this diagram (depending on the inclusion of various modules) and an ML model being another set of points, and comparing where they land. One caveat, this is looking at a single source -> target relationship at a single time lag, obviously it could be done for multiple pairs and time lags but it adds more complexity.

amsnyder commented 2 years ago

@jds485

Just noting that I think changing the loss function (e.g., adding process guidance or using a different performance metric) would amount to a structural change. I could also see generating uncertainty in the points on both axes of this plot with any model that generates distributional outputs (MC dropout, prediction of error distribution parameters).

amsnyder commented 2 years ago

@galengorski

2) Goodwell and Kumar 2017 are good references for these methods. Below is a nice figure from the first of those papers:

For this figure the authors use a moving window to look at how air temperature affects relative humidity in combination with other variables and if it is done in a redundant way (darker colors), meaning that other variables give similar information about relative humidity or if it is done in a synergistic way (lighter colors), meaning that both variables jointly contribute information to inform relative humidity.

This can be confusing to interpret, for example the biggest transfer of information in this plot is the redundant between air temperature and relative humidity, meaning much of the info in air temp is already contained within past relative humidity values? But It does reveals some interesting seasonal trends and could be used to look at flows of information during different time periods (drought, precip events etc)

amsnyder commented 2 years ago

@galengorski

3) This could be done in a few different ways, but these are good references Goodwell and Kumar 2017 and Tennant et al 2020. The figure below shows an example from the first paper:

This is a heatmap looking at the dominant time scales of interaction between various sources in a for the whole record and then in b under different wetness conditions and how they compare to the whole. The authors use the maximum mutual information across a series of time lags. I could see something like this looking at different model inputs and their relationship to the predicted variable for different sites, models, and/or time periods.

amsnyder commented 2 years ago

@jds485

Heatmaps like these could also be used as a model diagnostic. For example, is the model capturing the critical lag dependencies that are in the observed timeseries (for wet/dry conditions)?

amsnyder commented 2 years ago

@galengorski

Absolutely, I think this would be a really interesting idea. I think it goes a step beyond the functional/predictive performance tradeoff to look at how well the critical time scale are represented under different conditions. For estuary salinity, I'm thinking about the different conditions as the location of the salt front. But for transfer entropy it's important to have a continuous time series, so I'm trying to brainstorm a way to segment the time series while still maintaining long enough "chunks" to use transfer entropy. If that makes any sense...

amsnyder commented 2 years ago

@jds485

I think it goes a step beyond the functional/predictive performance tradeoff

I agree. Even using different time lags for functional performance would not capture if those lags are important. In that same line of thinking, a diagnostic like this for the observations could inform at which lags to evaluate functional performance.

maintaining long enough "chunks" to use transfer entropy

Yes, I understand the problem. Goodwell and Kumar must have had a similar problem applying this to different rain conditions. I'm not remembering if they describe those details in the paper. Did you check?

amsnyder commented 2 years ago

@galengorski

Yeah I should have finished off the thought more clearly. They are able to handle this because they are using high frequency data (I think 1-min resolution), so if a time period is over a single night for example, they have enough data to robustly estimate the pdfs. That certainly isn't the case for our case using daily data. I think this is a limitation with using some of these methods with a daily temporal resolution, and it's one of the main reasons we are considering looking at sub-daily data as well (see #50 ).

USGS-R / drb-estuary-salinity-ml

Discussion: Ideas on how to use information theory to assess models #130