Before: took the square distance between One [state,action] pair and ALL other pairs in the batch (for every [state, action] pair), see _square_distance(x,y) function in models.py, which is used in _gaussian_kernel function. But the predicted reward is supposed to be the square distance between output of two different NN (Predictor vs Target network) with input:[state, action].
Before: took the square distance between One [state,action] pair and ALL other pairs in the batch (for every [state, action] pair), see
_square_distance(x,y)
function inmodels.py
, which is used in_gaussian_kernel
function. But the predicted reward is supposed to be the square distance between output of two different NN (Predictor vs Target network) with input:[state, action].See original RED implementation: https://github.com/RuohanW/RED/blob/412aba4e9fc68102b14040e5fa0989cc3ab9aaa3/baselines/rnd_gail/rnd_critic.py#L29 for reference