Summary: Explained variance (EV) captures the fraction of variance explained by the baseline for the return. If there's little variance in the sampled returns (this may happen in later phases of training, e.g.), EV = 1 indicates a constant baseline, whereas EV = 0 indicates a non-constant baseline.
We explain explained variance through odd-looking curves generated by a 1-step MDP.
OneStepNoStateEnv, k=6, no whitening, Explained variance:
EV code (with slight modifications) from rllab/misc/special.py:
def explained_variance_1d(ypred, y, epsilon=1e-8):
assert y.ndim == 1 and ypred.ndim == 1
vary = np.var(y)
if np.isclose(vary, 0):
# TODO(cathywu) why this distinction?
if np.var(ypred) > 0:
return 0
else:
return 1
return 1 - np.var(y - ypred) / (vary + epsilon)
Understanding the (odd) curves above:
AD baseline: Why does EV drop to 0?
There's no more variance to explain in the sampled data. That is, np.var(returns) is close to 0. (and, for some reason, np.var(subbaseline) is not 0).
Linear baseline: What does the hovering mean?
Mid-training (~itr=166), np.var(baseline-returns) == np.var(returns), so EV should be 0, but the epsilon in the denominator (presumably to avoid division by 0 errors) is so large (1e-8) that EV is instead some arbitrary number like 0.4.
Solution: Set epsilon to 1e-11. Commit e67b524713b00f96d0d8b4296c068b73cd9b6f5e.
ZeroBaseline: What does EV=1 mean?
np.var(returns) is close to 0 AND np.var(subbaseline) is 0, so EV = 1.
Summary: Explained variance (EV) captures the fraction of variance explained by the baseline for the return. If there's little variance in the sampled returns (this may happen in later phases of training, e.g.), EV = 1 indicates a constant baseline, whereas EV = 0 indicates a non-constant baseline.
We explain explained variance through odd-looking curves generated by a 1-step MDP.
OneStepNoStateEnv, k=6, no whitening, Explained variance:
EV code (with slight modifications) from
rllab/misc/special.py
:Understanding the (odd) curves above:
np.var(returns)
is close to 0. (and, for some reason,np.var(subbaseline)
is not 0).itr=166
),np.var(baseline-returns) == np.var(returns)
, so EV should be 0, but the epsilon in the denominator (presumably to avoid division by 0 errors) is so large (1e-8) that EV is instead some arbitrary number like 0.4.np.var(returns)
is close to 0 ANDnp.var(subbaseline)
is 0, so EV = 1.