Kaixhin / Rainbow

Rainbow: Combining Improvements in Deep Reinforcement Learning
MIT License
1.59k stars 284 forks source link

Policy and reward function #63

Closed zyzhang1130 closed 4 years ago

zyzhang1130 commented 4 years ago

Hi, There is certain thing I would like to modify for policy and reward function. May I ask where is policy stored after each epoch of training? Is there some way to call/index/assign it with some flag? Thanks for answering.

Kaixhin commented 4 years ago

The agent has an act method, and there's code for loading pretrained weights in __init__.

zyzhang1130 commented 4 years ago

Do you mind to further elaborate what does act do and is the policy/model/weights saved after each epoch/iteration? If not how should I make it happen?

Thanks a lot.

Kaixhin commented 4 years ago

act predicts 1-of-k actions given a state, picking the action with the highest Q-value. The model weights are saved every checkpoint-interval.

zyzhang1130 commented 4 years ago

noted with thanks.