Closed joseph-chan closed 4 years ago
I am focusing on making the library usable/writing the docs right now. Once the package is in it's "mature" state, I will be adding new algorithms. You know, changing one function and rewriting it in ~5 files has never been cool. I d assume it would take a couple of weeks
Reinforce TopK works on top of existing working recommenders (DDPG, TD3, SAC, BCQ) and I am still working on these. Gonna keep your issue open till it's released.
I am focusing on making the library usable/writing the docs right now. Once the package is in it's "mature" state, I will be adding new algorithms. You know, changing one function and rewriting it in ~5 files has never been cool. I d assume it would take a couple of weeks
It's definitively a great job, and many people will benefit from it, thanks for your great effort and I will pay attention to your cool work
Attention...
look forward to it
Yes, I have already started to write an article. The article is scheduled this week. Dunno about the implementation, but it is coming soon. Already started working on the implementation, got a rough sketch here: https://gist.github.com/awarebayes/9c8a0f28e83a723549ce4ed0c4731997. In the article covered most of the paper already. Make sure to follow and not miss it: https://medium.com/@awarebayes. The code will be published here.
Just implemented OffPolicy correction, without Top K yet. But TopK is coming.... Link to the notebook
Just implemented OffPolicy correction, without Top K yet. But TopK is coming.... Link to the notebook
Great job! I am paying attention to your recent work all the time, and your good job really give us a surprise
TL;DR: will keep it open until the article is published Just released Reinforce TopK! Here is the private article draft (not released yet): link. I was gonna send it to google folks to take a quick look at, so it is somewhat official. Also, set up all of the notebooks online.
Anyways, here is a quick look at policy loss with TopK Correction
Without any correction:
Notebook in the examples
Code and detailed documentation already given in the previous reply of awarebayes, I closed the issue
when will top-k off-policy be implemented? I was reading this paper and looking forward to its implementation released, haha