Learner is likely a bottleneck in your experiment (50 times)

alex404 commented 2 years ago

Hello, I've been running sample-factory with much larger encoders than the defaults, and this warning keeps popping up:

Learner 0 accumulated too much experience, stop experience collection! Learner is likely a bottleneck in your experiment (50 times)

I'm wondering if there's anything you'd recommend to address this? Performance still seems okay (FPS ~25,000), although I noticed this tends to happen when my GPUs memory gets maxed out, and I'm wondering if there's e.g. some arguments I could set to improve performance.

Thanks!

alex-petrenko commented 2 years ago

Hi @alex404

This message is normal, it only means that you collect experience faster than you can learn from it. Factors that can affect learner speed:

Model size. Bigger models train slowly
Batch size not large enough (not enough learner throughput)
You are doing several PPO epochs (try reducing this to 1 if it's not already, --ppo_epochs=1)
Use a feed forward model instead of an rnn (--use_rnn=False) if you can do that. If you have to use RNN, decreasing --recurrence and --rollout is an option (i.e. set both to 16 to minimize BPTT cost)

I am not sure if learner performance and memory usage are related. As long as you don't hit OOM you should be good. Potential changes to reduce GPU memory usage:

reduce batch size
reduce model size, i.e. the number of parameters in the policy
use shared actor/critic model if you don't already (shared by default)
use a feed forward model instead of RNN
use a single policy worker instead of multiple (default is single --policy_workers_per_policy=1)

If these things are already optimized and you're still seeing this message, you could be at the limit of your GPU throughput.

With some environments, basically does not matter what you do, you will always see this message. I.e. if the environment is really fast and the model is slow (like in Megaverse)

Is this happening in VizDoom experiments? Posting your entire cfg.json here from the experiment folder could help me look at your settings! :)

alex404 commented 2 years ago

For my project my models are going to be more complex, and I've noticed as I've switched from your examples to my more complex models, performance has gone from being CPU bottlenecked to GPU bottlenecked (at least based on CPU/GPU utilization). So I guess I should be expecting this error! I've included my cfg.json for one of my experiments at the end, so if you see anything else worth pointing out I'm happy to hear it.

Thanks for the tips, regardless. I've been fine tuning these arguments and they've helped me optimize my performance. It's not horribly out of balance... I'm still getting around 25k FPS, compared to 75 or 100K on your demos. For now I'll probably try and the limit the complexity of my neural networks until I need to make them larger.

I am indeed using vizdoom, and nice work on Megaverse by the way. Your work in that paper helped convince me that I should use vizdoom, as we're studying low-level vision rather than robotics, and e.g. dmlab or mujoco would likely lead to unnecessary performance bottlenecks. At the same time it's good to know that even with "unlimited" sample generation, some simple RL problems still remain unsolvable, and so the algorithm and theory side requires a lot of progress too!

{ "algo": "APPO", "env": "doom_apples_gathering_supreme", "experiment": "apples_gathering_supreme2", "experiments_root": null, "help": false, "train_dir": "/home/alex404/code/retina-rl/train_dir", "device": "gpu", "seed": null, "save_every_sec": 120, "keep_checkpoints": 3, "save_milestones_sec": -1, "stats_avg": 100, "learning_rate": 0.0001, "train_for_env_steps": 10000000000, "train_for_seconds": 10000000000, "obs_subtract_mean": 0.0, "obs_scale": 255.0, "gamma": 0.99, "reward_scale": 1.0, "reward_clip": 10.0, "encoder_type": "conv", "encoder_subtype": "convnet_simple", "encoder_custom": "retina_encoder", "encoder_extra_fc_layers": 1, "hidden_size": 512, "nonlinearity": "elu", "policy_initialization": "orthogonal", "policy_init_gain": 1.0, "actor_critic_share_weights": true, "use_spectral_norm": false, "adaptive_stddev": true, "initial_stddev": 1.0, "experiment_summaries_interval": 20, "adam_eps": 1e-06, "adam_beta1": 0.9, "adam_beta2": 0.999, "gae_lambda": 0.95, "rollout": 32, "num_workers": 16, "recurrence": 32, "use_rnn": true, "rnn_type": "gru", "rnn_num_layers": 1, "ppo_clip_ratio": 0.1, "ppo_clip_value": 0.2, "batch_size": 1024, "num_batches_per_iteration": 1, "ppo_epochs": 1, "num_minibatches_to_accumulate": -1, "max_grad_norm": 4.0, "exploration_loss_coeff": 0.001, "value_loss_coeff": 0.5, "kl_loss_coeff": 0.0, "exploration_loss": "symmetric_kl", "num_envs_per_worker": 24, "worker_num_splits": 2, "num_policies": 1, "policy_workers_per_policy": 1, "max_policy_lag": 10000, "traj_buffers_excess_ratio": 1.3, "decorrelate_experience_max_seconds": 10, "decorrelate_envs_on_one_worker": true, "with_vtrace": true, "vtrace_rho": 1.0, "vtrace_c": 1.0, "set_workers_cpu_affinity": true, "force_envs_single_thread": true, "reset_timeout_seconds": 120, "default_niceness": 0, "train_in_background_thread": true, "learner_main_loop_num_cores": 1, "actor_worker_gpus": [], "with_pbt": false, "pbt_mix_policies_in_one_env": true, "pbt_period_env_steps": 5000000, "pbt_start_mutation": 20000000, "pbt_replace_fraction": 0.3, "pbt_mutation_rate": 0.15, "pbt_replace_reward_gap": 0.1, "pbt_replace_reward_gap_absolute": 1e-06, "pbt_optimize_batch_size": false, "pbt_target_objective": "true_reward", "use_cpc": false, "cpc_forward_steps": 8, "cpc_time_subsample": 6, "cpc_forward_subsample": 2, "benchmark": false, "sampler_only": false, "env_frameskip": 4, "env_framestack": 4, "pixel_format": "CHW", "num_agents": -1, "num_humans": 0, "num_bots": -1, "start_bot_difficulty": null, "timelimit": null, "res_w": 128, "res_h": 72, "wide_aspect_ratio": false, "my_custom_arg": 42, "fps": 35, "command_line": "--env=doom_apples_gathering_supreme --algo=APPO --env_frameskip=4 --use_rnn=True --num_workers=16 --num_envs_per_worker=24 --num_policies=1 --ppo_epochs=1 --rollout=32 --recurrence=32 --wide_aspect_ratio=False --encoder_custom=retina_encoder --experiment=apples_gathering_supreme2", "cli_args": { "algo": "APPO", "env": "doom_apples_gathering_supreme", "experiment": "apples_gathering_supreme2", "encoder_custom": "retina_encoder", "rollout": 32, "num_workers": 16, "recurrence": 32, "use_rnn": true, "ppo_epochs": 1, "num_envs_per_worker": 24, "num_policies": 1, "env_frameskip": 4, "wide_aspect_ratio": false }, "git_hash": "fa5269a21712b6ec5599596bbefd49a0a82c87cf", "git_repo_name": "git@github.com:berenslab/retina-rl.git", "record_to": null }

alex-petrenko commented 2 years ago

Your parameters seem reasonable. I would try bumping batch size up to 2048 or even 4096 to see how this affects learning. Can be a bit faster at a cost of some sample efficiency. Each task reacts differently to batch size, so only experiments can help.

Perhaps increase --learner_main_loop_num_cores to 4 or thereabout, this can help speed up trajectory batching on the learner. But do not expect a large speedup.

alex-petrenko commented 2 years ago

Setting max_grad_norm to 0.0 can help a bit too: gradient clipping for a large model is not a free operation and this can speed up the learner. If you don't see any learning instabilities you can get away with no clipping.

alex404 commented 2 years ago

Thanks for the further tips, I'll try them out during the week!

alex-petrenko / sample-factory

Learner is likely a bottleneck in your experiment (50 times) #120