awjuliani / successor_examples

Tutorials on learning and using successor representations.
MIT License
50 stars 14 forks source link

Performance of SR #2

Open devloper13 opened 4 years ago

devloper13 commented 4 years ago

What are the drawbacks of SR? Does it take more time to train, compared to say, Q-Learning? I'm trying to train Taxi-v2..so far unsuccessful.

awjuliani commented 4 years ago

Hi @devloper13,

If you are interested in learning an optimal policy in a fixed environment where the goal signal does not change, then there isn't really much benefit to learning the SR and the reward function separately. The real benefit of SRs is the ability to dissociate the two.

devloper13 commented 4 years ago

Yes I agree. In the Taxi-V2 environment the goal state changes after every episode.... in fact the pickup location and fault states also change as far as I know but the reward remains constant. Of course my main intention is to learn the transition dynamics. So in this scenario, where I'm not using SR for transfer learning (reward doesn't change), what do you think SR's performance should be like compared to Q-Learning?

awjuliani commented 4 years ago

Hi @devloper13

My understanding is that they should theoretically be very similar, assuming you are using the same updating scheme to learn the SR and R as you were to learn the Q.