DLR-RM stable-baselines3 issues

DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.

https://stable-baselines3.readthedocs.io

MIT License

8.35k stars 1.6k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

Hotfix: revert loading with `weights_only=True`

#1913 araffin closed 2 months ago
0
[Bug]: evaluate_policy called multiple times vor vectorized environments

#1912 LukasFehring opened 2 months ago
5
[Bug]: Load Trained Policy

#1911 zlw21gxy closed 2 months ago
8
Fix tensorboad video slow numpy->torch conversion

#1910 NickLucche closed 2 months ago
0
Discrepancy between Observations Sampled from Gym Env and Replay Buffer

#1909 AOAA96 closed 2 months ago
3
Fix memory leak in base_class.py

#1908 peteole closed 1 month ago
7
Scaling Environment

#1907 Hamza-101 closed 1 month ago
6
[Bug]: Scaling Environment

#1906 Hamza-101 closed 2 months ago
9
Scalability

#1905 Hamza-101 closed 2 months ago
2
Adding ER-MRL to community project

#1904 corentinlger closed 2 months ago
1
[Question] How to avoid SAC to stuck in local minima

#1903 JaimeParker closed 2 months ago
1
Weights only param

#1902 markscsmith opened 2 months ago
1
Cast learning_rate to float lambda for pickle safety when doing model.load

#1901 markscsmith closed 2 months ago
1
[Bug]: if learning_rate function uses special types, they can cause torch.load to fail when weights_only=True

#1900 markscsmith closed 2 months ago
4
Parameterize weights_only during load to allow loading of unusual models

#1899 markscsmith closed 2 months ago
2
[Question] Discontinuous reward training curve

#1898 JaimeParker closed 2 months ago
4
[Question] policy gradient loss and explained variance very small (almost zero) from the training start?

#1897 Ahmed-Radwan094 closed 2 months ago
2
[Feature Request] Enable predict to take tensor as input

#1896 llewynS closed 2 months ago
3
Off policy algorithm policy_kwargs

#1895 suargi closed 2 months ago
2
[Bug]: Potential Bug in PPO? Clarification requested

#1894 azrael417 closed 2 months ago
2
[Question] CheckpointCallback keep last K

#1893 NickLucche closed 2 months ago
2
Issue(HER with in SAC algorithm)

#1892 wadeKeith closed 2 months ago
2
[Question] Saving PPO rollout buffer on GPU

#1891 Ahmed-Radwan094 closed 2 months ago
2
[Bug]: EOFError after running for some steps

#1890 GeorgeWuzy closed 2 months ago
1
[Question] How to pass a varying gamma to DQN or PPO during training?

#1889 rariss opened 2 months ago
6
Why does the Logger only return the train/ metrics, and not eval/, time/, and rollout/?

#1888 liamquantrill closed 2 months ago
1
[Question] Discretize continuous actions/observations ?

#1887 nrigol closed 2 months ago
1
Training of PPO freezes after number of iterations

#1886 Ahmed-Radwan094 closed 2 months ago
8
[Question] influence of buffer size when using vecenv and save customized replay buffer

#1885 JaimeParker closed 2 months ago
2
Fixed broken link in ppo.rst

#1884 chaitanyabisht closed 2 months ago
0
Why does VecFrameStack clear the prior frames in the stack for the step when "terminated=True"?

#1883 wkwan closed 2 months ago
2
Fix typo in changelog

#1882 araffin closed 3 months ago
0
How to elegantly modify an algorithm by adding new architectures trained with custom losses?

#1881 jamesheald closed 3 months ago
2
[Question] [Multiprocessing] RolloutBuffer groups environment transitions on a per-environment basis.

#1880 N00bcak closed 3 months ago
1
Release v2.3.0

#1879 araffin closed 3 months ago
0
How does stable-baselines work with a multi-agent pettingzoo environment?

#1878 AnastasiaPsarou closed 3 months ago
1
[Feature Request] Resume trained model with set_parameters without reset_num_timesteps

#1877 tanielsfranklin closed 3 months ago
4
[Question] Action masking for a DQN Agent

#1876 Tim1605 closed 3 months ago
1
[Question] Changes in observations

#1875 d505 closed 3 months ago
1
[Question] Training PPO model with single step episodes

#1874 oshadajay closed 3 months ago
7
Exporting MultiInputActorCriticPolicy as ONNX

#1873 MaximCamilleri opened 3 months ago
5
[Question] Control PPO training

#1872 mwalidcharrwi closed 3 months ago
0
[Question] How can I wrap a non-image observation trained model via an image observation wrapper?

#1871 zichunxx closed 3 months ago
5
Log success rate for on policy algorithms

#1870 corentinlger closed 3 months ago
4
[Question] SubprocVecEnv doesn't work with registered custom environments

#1869 marcusfechner closed 3 months ago
3
[Feature Request] Allow Gymnasium Composite Spaces

#1868 flowerthrower closed 3 months ago
2
[Bug]: `rollout/success_rate` does not show for Monitor + OnPolicyAlgorithm

#1867 N00bcak closed 3 months ago
1
Update ruff and documentation for hf sb3

#1866 araffin closed 3 months ago
0
[Bug]: unsupported operand for +: 'float' and 'NoneType' during PPO Training with Custom DSSAT Gym Wrapper

#1865 louisreberga closed 3 months ago
1
[Bug]: PPO handle TimeLimit.truncated incorrectly

#1864 yinzikang closed 4 months ago
2

Previous Next