DLR-RM / rl-baselines3-zoo

A training framework for Stable Baselines3 reinforcement learning agents, with hyperparameter optimization and pre-trained agents included.
MIT License
[Question] RuntimeError: Unable to sample before the end of the first episode. We recommend choosing a value for learning_starts that is greater than the maximum number of timesteps in the environment. #433

moneypi commented 5 months ago

❓ Question

I can't train "parking-v0", every time I run command python train.py --algo tqc --env parking-v0, I get an error:

(base) zhuqinghua@lcwt-DSS8440:~/workspace/rl-baselines3-zoo-master$ ./train_parking.sh
========== parking-v0 ==========
Seed: 1667912339
Loading hyperparameters from: /home/zhuqinghua/workspace/rl-baselines3-zoo-master/hyperparams/tqc.yml
Default hyperparameters for environment (ones being tuned will be overridden):
OrderedDict([('batch_size', 256),
             ('buffer_size', 300000),
             ('gamma', 0.98),
             ('learning_rate', 0.0015),
             ('n_timesteps', 100000.0),
             ('policy', 'MultiInputPolicy'),
             ('policy_kwargs', 'dict(net_arch=[512, 512, 512], n_critics=2)'),
             ('replay_buffer_class', 'HerReplayBuffer'),
              "dict( goal_selection_strategy='episode', n_sampled_goal=4, )"),
             ('tau', 0.005)])
Using 1 environments
Creating test environment
Wrapping the env in a VecTransposeImage.
Wrapping the env in a VecTransposeImage.
Using cuda device
Log path: logs/tqc/parking-v0_14
Traceback (most recent call last):
  File "train.py", line 4, in <module>
  File "/home/zhuqinghua/workspace/rl-baselines3-zoo-master/rl_zoo3/train.py", line 272, in train
  File "/home/zhuqinghua/workspace/rl-baselines3-zoo-master/rl_zoo3/exp_manager.py", line 241, in learn
    model.learn(self.n_timesteps, **kwargs)
  File "/home/zhuqinghua/miniconda3/lib/python3.8/site-packages/sb3_contrib/tqc/tqc.py", line 304, in learn
    return super().learn(
  File "/home/zhuqinghua/miniconda3/lib/python3.8/site-packages/stable_baselines3/common/off_policy_algorithm.py", line 347, in learn
    self.train(batch_size=self.batch_size, gradient_steps=gradient_steps)
  File "/home/zhuqinghua/miniconda3/lib/python3.8/site-packages/sb3_contrib/tqc/tqc.py", line 211, in train
    replay_data = self.replay_buffer.sample(batch_size, env=self._vec_normalize_env)  # type: ignore[union-attr]
  File "/home/zhuqinghua/miniconda3/lib/python3.8/site-packages/stable_baselines3/her/her_replay_buffer.py", line 198, in sample
    raise RuntimeError(
RuntimeError: Unable to sample before the end of the first episode. We recommend choosing a value for learning_starts that is greater than the maximum number of timesteps in the environment.
(base) zhuqinghua@lcwt-DSS8440:~/workspace/rl-baselines3-zoo-master$


araffin commented 5 months ago

Hello, please share the different versions of packages you are using.

qgallouedec commented 5 months ago

Can you share the versions you use (see the bug report issue template; next time use this template instead of the question template)

moneypi commented 5 months ago
(base) zhuqinghua@lcwt-DSS8440:~/workspace/highway-env_minimum$ python -c 'import stable_baselines3 as sb3; sb3.get_system_info()'
- OS: Linux-6.5.0-14-generic-x86_64-with-glibc2.17 # 14~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 20 18:15:30 UTC 2
- Python: 3.8.18
- Stable-Baselines3: 2.2.1
- PyTorch: 2.0.0.post200
- GPU Enabled: True
- Numpy: 1.24.3
- Cloudpickle: 2.2.1
- Gymnasium: 0.28.1
- OpenAI Gym: 0.26.2

araffin commented 5 months ago

Could you try upgrading to gymnasium 0.29.1? You should probably use -W ignore in that case: python -W ignore train.py --algo tqc --env parking-v0 --seed 1667912339.

I couldn't reproduce your issue so far. In case it happens, you can set -param learning_starts:1000 or a larger value to avoid this issue.

moneypi commented 5 months ago

Could you try upgrading to gymnasium 0.29.1? You should probably use -W ignore in that case: python -W ignore train.py --algo tqc --env parking-v0 --seed 1667912339.

I couldn't reproduce your issue so far. In case it happens, you can set -param learning_starts:1000 or a larger value to avoid this issue.

Thanks, learning_starts:1000 works fine for me. Thank you so much.