Open nikhilrayaprolu opened 2 years ago
Hey @nikhilrayaprolu, I'm currently working on it for my master thesis. However, this algorithm seems not to work perfectly... atleast I didnt have luck yet..
Sure @SaifAlDilaimi looks like the DQN part (keeping only TD loss) is not converging. Probably once that is debugged things might get solved.
Also, the base code is taken from here https://github.com/udacity/deep-reinforcement-learning/blob/master/dqn/solution/Deep_Q_Network_Solution.ipynb
So reimplementing SRs from scratch might also help!!
Here's a TF version that is my own implementation that is based on the paper "Deep Successor in Reinforcement Learning"
https://github.com/SaifAlDilaimi/OpenAIGym-DSR/blob/main/dsr_example.py
Sure, @SaifAlDilaimi will look into it. Do you know the right way to evaluate the successor features? Ideally, in Deep Q networks, we can use the average return per episode to check if the model is learning or not, but how can we test the successor model?
I realized that on reset, the grid layout changes along with the agent position in minigrid. Did you look at that? How can I fix the grid layout but change the agent position to a random start location on reset?
Hey, the environment was just an example for this repo. I'm not familiar with that. This repo is focused only on the DSR network ...
I tried executing your code and found that the algorithm is not converging. The scores after a while are outputting to zero.