Open pengzhenghao opened 1 year ago
Do you have any comment on which method is better? Removing action from the reward function: g(s, a) -> g(s) makes the meaning completely different. Is this a reasonable choice?
g(s, a) -> g(s)
Do you have any comment on which method is better? Removing action from the reward function:
g(s, a) -> g(s)
makes the meaning completely different. Is this a reasonable choice?