SaifAlDilaimi / OpenAIGym-DSR

Implementation of the paper "Deep Successor in Reinforcement Learning"
0 stars 1 forks source link

is torch_count_dsr working? #2

Open nikhilrayaprolu opened 2 years ago

nikhilrayaprolu commented 2 years ago

I tried executing your code and found that the algorithm is not converging. The scores after a while are outputting to zero.

SaifAlDilaimi commented 2 years ago

Hey @nikhilrayaprolu, I'm currently working on it for my master thesis. However, this algorithm seems not to work perfectly... atleast I didnt have luck yet..

nikhilrayaprolu commented 2 years ago

Sure @SaifAlDilaimi looks like the DQN part (keeping only TD loss) is not converging. Probably once that is debugged things might get solved.

nikhilrayaprolu commented 2 years ago

Also, the base code is taken from here https://github.com/udacity/deep-reinforcement-learning/blob/master/dqn/solution/Deep_Q_Network_Solution.ipynb

So reimplementing SRs from scratch might also help!!

SaifAlDilaimi commented 2 years ago

Here's a TF version that is my own implementation that is based on the paper "Deep Successor in Reinforcement Learning"

https://github.com/SaifAlDilaimi/OpenAIGym-DSR/blob/main/dsr_example.py

nikhilrayaprolu commented 2 years ago

Sure, @SaifAlDilaimi will look into it. Do you know the right way to evaluate the successor features? Ideally, in Deep Q networks, we can use the average return per episode to check if the model is learning or not, but how can we test the successor model?

nikhilrayaprolu commented 2 years ago

I realized that on reset, the grid layout changes along with the agent position in minigrid. Did you look at that? How can I fix the grid layout but change the agent position to a random start location on reset?

SaifAlDilaimi commented 2 years ago

Hey, the environment was just an example for this repo. I'm not familiar with that. This repo is focused only on the DSR network ...