Stable-Baselines-Team stable-baselines3-contrib issues

Stable-Baselines-Team / stable-baselines3-contrib

Contrib package for Stable-Baselines3 - Experimental reinforcement learning (RL) code

https://sb3-contrib.readthedocs.io

MIT License

465 stars 173 forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Fix QRDQN loading `target_update_interval`

#259 jak3122 closed 15 hours ago
0
[Bug]: loading QRDQN changes target_update_interval

#258 jak3122 closed 15 hours ago
0
[Question] Why can't directly use the PPO (RecurrentActorCriticPolicy, "CartPole - v1", verbose = 1)

#257 dajianer opened 1 month ago
1
[Bug]: Is sb3_contrib/common/maskable/utils.py the cause of "WARN: env.action_masks to get variables from other wrappers is deprecated and will be removed in v1.0"?

#256 mkbg8 opened 1 month ago
1
Fix warning when loading a `RecurrentPPO` model

#255 araffin closed 1 month ago
0
[Bug]: FutureWarning: You are using `torch.load` with `weights_only=False`

#254 drulye closed 1 month ago
3
[Feature Request] same random seed for every env in AsyncEval

#253 1-Bart-1 opened 2 months ago
1
Update QR-DQN optimizer to only use q_net parameters

#252 corentinlger closed 2 months ago
1
Update SB3 and remove gSDE resampling

#251 araffin closed 3 months ago
0
[Question] Masked actions PPO in multiagent setting using PettigZoo

#250 MarcoPicione opened 3 months ago
0
[Question] Apply Masking using ActionMasker on composite actions

#249 mwalidcharrwi closed 3 months ago
4
[Question] How to do pre-training on the RecurrentPPO MlpLstmPolicy

#248 iwishiwasaneagle opened 4 months ago
0
MaskablePPO Masking Doesn't Work with Big Action Space

#247 orkunkn closed 4 months ago
1
RecurrentActorCriticPolicy Behaviour Not Clear

#246 pasinit opened 4 months ago
1
TQC: ep_len_mean and ep_rew_mean does not match real values

#245 btabia opened 4 months ago
0
ep_len_mean discrepancy

#244 btabia closed 4 months ago
0
Implemented CrossQ

#243 danielpalen opened 5 months ago
10
Dependent Actions in MultiDiscrete Action Space

#242 bbarisbaturay opened 5 months ago
5
[Question] Recurrent Maskable PPO ?!? Rudder ?!?

#241 tty666 closed 5 months ago
1
[Question] What is the difference between old_distribution and distribution in train function of TRPO

#240 0Addicted0 closed 5 months ago
2
[Question] RecurrentPPO: Reset LSTM states early?

#239 phisad opened 6 months ago
3
[Feature Request] Implement CrossQ

#238 danielpalen opened 6 months ago
0
Fix typo in changelog

#237 araffin closed 6 months ago
0
Release v2.3.0

#236 araffin closed 6 months ago
0
Log success rate for PPO variants

#235 araffin closed 6 months ago
0
[Question] Why does MaskablePPO does not mask with some logic with last observation?

#234 EloyAnguiano opened 6 months ago
4
Fix PPO maskable type annotations

#233 araffin closed 6 months ago
0
Update ruff and SB3 dependencies

#232 araffin closed 6 months ago
0
[Question] Simple way to implement data augmentation when training agent

#231 thomashirtz closed 7 months ago
2
[Question] LSTM observations

#230 suargi closed 8 months ago
3
Fix `train_freq` type annotation for TQC and QR-DQN

#229 Armandpl closed 8 months ago
0
Episodic training with TQC?

#228 Armandpl closed 8 months ago
2
Add note about MaskableEvalCallback

#227 icheered closed 8 months ago
0
EvalCallback crashes Maskable PPO without error

#226 icheered closed 8 months ago
3
Update QRDQN defaults

#225 araffin closed 8 months ago
0
Implementing "Sibling Rivalry" Method from "Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards" Paper

#224 vladyskai opened 8 months ago
1
[Feature Request] STAC algorithm

#223 EloyAnguiano opened 9 months ago
4
[Question] how to use "lstm_states" from rollout_buffer to reconstruct LSTM states during training

#222 DeepRowLie closed 8 months ago
2
[Bug]: producing NAN values during training in MaskablePPO

#221 vahidqo opened 9 months ago
5
[Feature Request] Expand RNN Options and Algorithm Flexibility

#220 mtnusf97 opened 9 months ago
2
Update `_process_sequence()` docstring

#219 rogerioagjr closed 10 months ago
0
[Question] Recurrent PPO evaluation

#218 CAI23sbP closed 10 months ago
2
Release v2.2.1: hotfix file closing

#217 araffin closed 10 months ago
0
Release v2.2.0

#216 araffin closed 10 months ago
0
Remove PyType and upgrade to latest SB3 version

#215 araffin closed 10 months ago
0
Add rollout_buffer_class to TRPO

#214 ernestum closed 11 months ago
2
Sync SB3 Contrib with SB3

#213 araffin closed 11 months ago
0
Predicting actions after using MaskablePPO model outputs invalid action

#212 vivek-kumar9696 closed 11 months ago
2
Recurrent PPO Not Training Well on a Very Simple Environment

#211 sreejank opened 11 months ago
0
Worse training with Vectorized Environment

#210 pklochowicz closed 8 months ago
0