Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code
https://sb3-contrib.readthedocs.io
MIT License
442 stars 166 forks source link

SACD Discrete Soft Actor Critic #203

Open splatter96 opened 11 months ago

splatter96 commented 11 months ago

This PR introduces the Soft Actor Critic for discrete actions (SACD) algorithm.

Description

This PR implements the SAC-Discrete algorithm as described in this paper https://arxiv.org/abs/1910.07207. This implementation borrows code from the papers original implementation (https://github.com/p-christ/Deep-Reinforcement-Learning-Algorithms-with-PyTorch) as well as provided by the issues author who requested this feature in stable baselines (https://github.com/toshikwa/sac-discrete.pytorch)

Context

Types of changes

Checklist:

Note: we are using a maximum length of 127 characters per line

araffin commented 10 months ago

Hello, thanks for the PR =)

The functionality/performance matches that of the source (required for new training algorithms or training-related features).

please don't forget that part (see contributing guide). I think there are discussion about the results here too: https://github.com/vwxyzjn/cleanrl/pull/270

splatter96 commented 10 months ago

Hello, thanks for the feedback :) Sorry for the late reply! Should I add the performance comparison to the source similarly as it is done in the official stable baselines3 algorithm pages? As in create a baselines3-zoo config for it and add the plots to this PR?

araffin commented 10 months ago

yes please =)