google-research / batch_rl

Offline Reinforcement Learning (aka Batch Reinforcement Learning) on Atari 2600 games
https://offline-rl.github.io/
Apache License 2.0
520 stars 73 forks source link

Transform matrix create only once? #27

Closed Hyperion-shuo closed 2 years ago

Hyperion-shuo commented 2 years ago

The paper says for each mini-batch randomly draw a categorical distribution. But in the code i only find the transform_matrix generate once in _create_network function, and is not change during trainning. Maybe i just miss it, the training option uses q_heads, and q_heads read transform_matrix from Network's kwargs, but i can't find where it update.

agarwl commented 2 years ago

So, the released code is for tensorflow graph mode, so the transform matrix is a graph op which generates a different random matrix each time it's evaluated in a sess.run call. Lemme know if you still have any confusion.

Hyperion-shuo commented 2 years ago

I see, the transform matrix is an option not a constant, thank you so much.