Open devloper13 opened 4 years ago
Hi @devloper13,
If you are interested in learning an optimal policy in a fixed environment where the goal signal does not change, then there isn't really much benefit to learning the SR and the reward function separately. The real benefit of SRs is the ability to dissociate the two.
Yes I agree. In the Taxi-V2 environment the goal state changes after every episode.... in fact the pickup location and fault states also change as far as I know but the reward remains constant. Of course my main intention is to learn the transition dynamics. So in this scenario, where I'm not using SR for transfer learning (reward doesn't change), what do you think SR's performance should be like compared to Q-Learning?
Hi @devloper13
My understanding is that they should theoretically be very similar, assuming you are using the same updating scheme to learn the SR and R as you were to learn the Q.
What are the drawbacks of SR? Does it take more time to train, compared to say, Q-Learning? I'm trying to train Taxi-v2..so far unsuccessful.