SACD Discrete Soft Actor Critic

splatter96 commented 11 months ago

This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.

Description

This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)

Context

[x] I have raised an issue to propose this change (required)
[x] Original issue in the stable baselines repo https://github.com/DLR-RM/stable-baselines3/issues/157

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to change)
[x] Documentation (update in the documentation)

Checklist:

[x] I've read the CONTRIBUTION guide (required)
[ ] The functionality/performance matches that of the source (required for new training algorithms or training-related features).
[x] I have updated the tests accordingly (required for a bug fix or a new feature).
[x] I have included an example of using the feature (required for new features).
[ ] I have included baseline results (required for new training algorithms or training-related features).
[x] I have updated the documentation accordingly.
[ ] I have updated the changelog accordingly (required).
[x] I have reformatted the code using make format (required)
[x] I have checked the codestyle using make check-codestyle and make lint (required)
[x] I have ensured make pytest and make type both pass. (required)

Note: we are using a maximum length of 127 characters per line

araffin commented 10 months ago

Hello, thanks for the PR =)

The functionality/performance matches that of the source (required for new training algorithms or training-related features).

please don't forget that part (see contributing guide). I think there are discussion about the results here too: https://github.com/vwxyzjn/cleanrl/pull/270

splatter96 commented 10 months ago

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

araffin commented 10 months ago

yes please =)

Stable-Baselines-Team / stable-baselines3-contrib