Kaixhin / imitation-learning

Imitation learning algorithms
MIT License
446 stars 39 forks source link

fixed RED predict_reward error/bug. #7

Closed Harimus closed 2 years ago

Harimus commented 2 years ago

Before: took the square distance between One [state,action] pair and ALL other pairs in the batch (for every [state, action] pair), see _square_distance(x,y) function in models.py, which is used in _gaussian_kernel function. But the predicted reward is supposed to be the square distance between output of two different NN (Predictor vs Target network) with input:[state, action].

See original RED implementation: https://github.com/RuohanW/RED/blob/412aba4e9fc68102b14040e5fa0989cc3ab9aaa3/baselines/rnd_gail/rnd_critic.py#L29 for reference