Implemented concrete dropout class and created metrics to evaluate uncertainty

MSRDL / Deep4Cast

Probabilistic Multivariate Time Series Forecast using Deep Learning

BSD 3-Clause "New" or "Revised" License

94 stars 22 forks source link

Implemented concrete dropout class and created metrics to evaluate uncertainty #44

Closed shirleyuw closed 6 years ago

shirleyuw commented 6 years ago

A few changes: (1) Implemented concrete dropout class, such that an optimal dropout rate can be automatically learnt based on the 'Concrete dropout' paper. For concrete dropout, predictions with uncertainty can be produced without specifying a dropout rate. I still kept the MCDropout class with a user-provided dropout rate. (2) Implemented two metrics to evaluate uncertainty: (a) Mean Scaled Inverse Score (MSIS) used in M6 time series competition for uncertainty, and (b) Coverage of prediction intervals, e.g. does the 95% prediction interval really have 95% chance to include the observations in test set? (3) demo the concrete dropout and metrics for uncertainty in the tutorials/incorporating-uncertainty.ipynt

zer0n commented 6 years ago

@satyanshukla , please help review this one as it is relevant to your next project. Nhat has also implemented/tested Concrete Dropout (see this closed PR) so you may want to check his code or pick his brain if you have questions.

zer0n commented 6 years ago

Let's wait to have some test first, to make sure that the implementation is correct.

Here are a few tests:

Calibration test (mentioned in Yarin's paper)
- Do we get better match in the calibration test than before concrete dropout?
Do we see a reduction in variance during training

shirleyuw commented 6 years ago

I will run these tests. Ken, can you explain more on the last test? Thanks, Shirley

On Jun 22, 2018, at 12:01 PM, Kenneth Tran notifications@github.com wrote:

Let's wait to have some test first, to make sure that the implementation is correct.

Here are a few tests:

Calibration test (mentioned in Yarin's paper) Do we get better match in the calibration test than before concrete dropout? Does the drop rate converge to 0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

shirleyuw commented 6 years ago

Added calibration plot to show if the coverage percentage matches the specified confidence level across the entire (0,1) range.
Did observe variability gets reduced as training epochs increases.
We decided to use MC dropout as default, instead of Concrete dropout. From the calibration plot, these two methods have similar performance. But the current Concrete dropout implementation only works on CNN, not yet for GRU and LSTM yet. We will create another ticket to implement Concrete dropout for RNN.