Open ZixuanLiu4869 opened 2 years ago
Hi, What do you mean by "I use my customized reward to update the agent network."? Reward is calculated and generated by the environment, which you can (and usually should) customize to your problem, but reward it NOT used to update the network In Acme, update happens in run_experiment._LearningActor._maybe_train(), which delegate to, say if you are using DQN, SGDLearner. The learner will calculate the TD-error (by calling a q_learning alg in rlax, e.g, double_q_learning()), and finally the calculated TD-error is used to do back-propagation to update the network. I'm not sure whether I answered your question, it's possible I get your q wrong, anyway, hope helpful.
I think, in this case, you can consider creating a special agent for your problem that only calls learner.step
at the end of an episode. Alternatively, there is always the possibility of writing your own custom training loop where you can decide when to update the agent. The default implementation in Acme covers the most common use case, for settings where you deviate from the default behavior, writing custom training loop or actors seems to be the only option that you can do right now.
Hi, What do you mean by "I use my customized reward to update the agent network."? Reward is calculated and generated by the environment, which you can (and usually should) customize to your problem, but reward it NOT used to update the network In Acme, update happens in run_experiment._LearningActor._maybe_train(), which delegate to, say if you are using DQN, SGDLearner. The learner will calculate the TD-error (by calling a q_learning alg in rlax, e.g, double_q_learning()), and finally the calculated TD-error is used to do back-propagation to update the network. I'm not sure whether I answered your question, it's possible I get your q wrong, anyway, hope helpful.
Hi, I know that reward is generated by the environment. Lets say, I want to record one episode and don't update the network this time. This episode has rewards that generated by the environment. Then I want to replace the rewards in the recorded episode with my own customized rewards and update the network. Is there a way that I can get the rewards from the recorded episode and change it?
I think, in this case, you can consider creating a special agent for your problem that only calls
learner.step
at the end of an episode. Alternatively, there is always the possibility of writing your own custom training loop where you can decide when to update the agent. The default implementation in Acme covers the most common use case, for settings where you deviate from the default behavior, writing custom training loop or actors seems to be the only option that you can do right now.
Do you have any examples about creating a special agent or writing custom training loop?
I think, in this case, you can consider creating a special agent for your problem that only calls
learner.step
at the end of an episode. Alternatively, there is always the possibility of writing your own custom training loop where you can decide when to update the agent. The default implementation in Acme covers the most common use case, for settings where you deviate from the default behavior, writing custom training loop or actors seems to be the only option that you can do right now.
Lets say, I want to record one episode and don't update the network this time. This episode has rewards that generated by the environment. Then I want to replace the rewards in the recorded episode with my own customized rewards and update the network. Is there a way that I can get the rewards from the recorded episode and change it?
Hi, I have some silly questions about updating the agent. I know the general framework of training is as follow:
And this
actor.update
is used to update the agent. But I want to run the whole episode, and then at the end of the episode, I use my customized reward to update the agent network. The framework is like:Then what should I do? I am new to this field and I would appreciate it if someone can help me!