DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.14k stars 1.7k forks source link

[Feature request] Implement SAC-Discrete #157

Closed toshikwa closed 3 years ago

toshikwa commented 4 years ago

Hi, thank you for your great work!! I'm interested in contributing to Stable-Baselines3.

I want to implement SAC-Discrete(paper, my implementation). Can we discuss before implementing??

Miffyli commented 4 years ago

Cheers for the nice comments :).

We are (still) working on getting v1.0 out, i.e. mainly bug testing and reviewing of the code. After the release we can discuss adding new algorithms or improvements to existing algorithms. On a quick glimpse this seems simple enough that it could be added with not much extra code.

araffin commented 4 years ago

Hello,

Thanks for the suggestion =)

In principle I would be for that addition. We mostly need to discuss the advantage of it vs DQN and variants (QR-DQN, ...) in term of performance and runtime and see how much effort it requires and complexity it adds.

@Miffyli maybe a good candidate for stable-baselines3 "contrib" (same as #83 )

toshikwa commented 4 years ago

Thank you for the response.

According to the paper, SAC-Discrete is evaluated with 100k environment steps because they are most interested in sample efficiency, not final performance.

Its results at 100k steps were not bad, but it failed to solve some simple tasks like Pong. DQN (and its extensions) can get much better result although needs more samples. I would say there is a trade-off. (What do you think?)

Once v1.0 is released, I can contribute to implementing QR-DQN and IQN, in addition to SAC-Discrete.

Thanks :)

araffin commented 4 years ago

The contrib repo is here ;) https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

make sure to read the contributing guide carefully first ;). In term of priority, I would prefer QR-DQN and IQN first. For QR-DQN, you can re-use the huber quantile loss defined in TQC.

(we don't advertise it yet as we want to check the process and not get too many request for now)

cosmir17 commented 3 years ago

I was asked to post it here, @PartiallyTyped, regarding the following comment. https://github.com/DLR-RM/stable-baselines3/issues/1#issuecomment-625938738

PartiallyTyped posted an academic paper link for a SAC algorithm that takes a discrete input.

I think PartiallyTyped is already aware since the main github link was mentioned on the paper page, there is a source code example for it. The author publicised his code. https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch/blob/master/agents/actor_critic_agents/SAC_Discrete.py

Hope this helps, Sean

araffin commented 3 years ago

I would now close this one as it rather belongs the contrib repo.

PartiallyTyped posted an academic paper link for a SAC algorithm that takes a discrete input.

Academic, yes, but not peer-reviewed...

cosmir17 commented 3 years ago

@araffin How about the following paper? https://arxiv.org/abs/1912.11077v1