Closed zyzhang1130 closed 4 years ago
Do you mind to further elaborate what does act do and is the policy/model/weights saved after each epoch/iteration? If not how should I make it happen?
Thanks a lot.
act
predicts 1-of-k actions given a state, picking the action with the highest Q-value. The model weights are saved every checkpoint-interval
.
noted with thanks.
Hi, There is certain thing I would like to modify for policy and reward function. May I ask where is policy stored after each epoch of training? Is there some way to call/index/assign it with some flag? Thanks for answering.