Closed hlsafin closed 7 months ago
I modified the code from agent57 and my old R2D2 code, so I just modified the existing replay buffer. I don't have any plans to continue after MEME agent, although if there is enough improvements on the checklist I might start a new repo for a new agent that outperforms MEME by a significant margin. I didn't test it on a lot of environments due to lack of compute.
I have access to higher compute power and I can post the results here; I just wanted things to be completed first before I do so. Namely the replay ratio and whatnot. I would be more than happy to see your approach as well. Thank you.
I cannot guarantee that the code is bug free, I'm also currently trying to fix an issue where the per buffer contains nans after around 20k updates and 8.00004 = 6 + 4 which causes problems when calling retrieve, I suspect the bug is in the per.py because this bug did not exist without per. I would be happy to add you to the repo if you want to help out.
Sure, I would love to help out; Although I am not 100 percent sure what or why this problem is occurring at 20k updates. Should we jump on a call and discuss things in detail? https://discord.gg/xNGWawAr Here is a discord Server I created for this project.
I fixed the problem, it is caused by numerical precision of representing priorities in the buffer 8.00004 = 6 + 4, I represented the priorities as int64 and that solved the problem.
How do you plan to incorporate SPI into your method? Have you considered using Reverb with Deepmind? Did you see the efficiency in the results compared to agent57?