TTomilin / COOM

COOM: Benchmarking Continual Reinforcement Learning on Doom
MIT License
12 stars 0 forks source link

Cannot reproduce the result for run_and_gun #1

Open ruipengZ opened 5 months ago

ruipengZ commented 5 months ago

Hi! I have trouble reproducing the result for the single environment run_and_gun. I run python run_single.py --scenario run_and_gun and get the following prints: (after 15 epochs, the training success is still 0)

2024-04-10 07:32:16.450011: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-04-10 07:32:16.630269: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2024-04-10 07:32:16.630319: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. 2024-04-10 07:32:17.626895: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2024-04-10 07:32:17.627104: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2024-04-10 07:32:17.627124: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. 2024-04-10 07:32:26.142719: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2024-04-10 07:32:26.142959: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublas.so.11'; dlerror: libcublas.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2024-04-10 07:32:26.143142: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcublasLt.so.11'; dlerror: libcublasLt.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2024-04-10 07:32:26.143322: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcufft.so.10'; dlerror: libcufft.so.10: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2024-04-10 07:32:26.765469: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcusparse.so.11'; dlerror: libcusparse.so.11: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.10/dist-packages/cv2/../../lib64:/usr/lib64-nvidia 2024-04-10 07:32:26.769994: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1934] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2024-04-10 07:32:26.770554: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Saving config: { "activation": "lrelu", "agent_policy_exploration": false, "alpha": "auto", "augment": false, "augmentation": null, "batch_size": 128, "buffer_type": "fifo", "cl_method": null, "cl_reg_coef": 0.0, "clipnorm": null, "envs": [ "default" ], "episodic_batch_size": 0, "episodic_mem_per_task": 0, "episodic_memory_from_buffer": true, "exploration_kind": null, "frame_height": 84, "frame_skip": 4, "frame_stack": 4, "frame_width": 84, "gamma": 0.99, "gpu": null, "group_id": "default_group", "hidden_sizes": [ 256, 256 ], "hide_task_id": false, "log_every": 1000, "logger_output": [ "tsv", "tensorboard" ], "lr": 0.001, "lr_decay": "linear", "lr_decay_rate": 0.1, "lr_decay_steps": 100000, "model_path": null, "multihead_archs": true, "n_updates": 50, "no_test": false, "num_repeats": 1, "packnet_retrain_steps": 0, "penalty_ammo_used": -0.1, "penalty_death": -1.0, "penalty_health_dtc": -1.0, "penalty_health_has": -5.0, "penalty_health_hg": -0.01, "penalty_lava": -0.1, "penalty_passivity": -0.1, "penalty_projectile": -0.01, "random_order": false, "record": false, "record_every": 100, "regularize_critic": false, "render": false, "render_sleep": 0.0, "replay_size": 50000, "reset_buffer_on_task_change": true, "reset_critic_on_task_change": false, "reset_optimizer_on_task_change": true, "resolution": null, "reward_delivery": 30.0, "reward_frame_survived": 0.01, "reward_health_has": 5.0, "reward_health_hg": 15.0, "reward_kill_chain": 5.0, "reward_kill_dtc": 1.0, "reward_kill_rag": 5.0, "reward_on_platform": 0.1, "reward_platform_reached": 1.0, "reward_scaler_pitfall": 0.1, "reward_scaler_traversal": 0.001, "reward_switch_pressed": 15.0, "reward_weapon_ad": 15.0, "save_freq_epochs": 25, "scenarios": [ "run_and_gun" ], "seed": 0, "sequence": null, "sparse_rewards": false, "start_from": 0, "start_steps": 10000, "steps_per_env": 200000, "target_output_std": 0.089, "test": true, "test_envs": [], "test_episodes": 3, "test_only": false, "update_after": 5000, "update_every": 500, "use_layer_norm": true, "use_lstm": false, "variable_queue_length": 5, "vcl_first_task_kl": false, "video_folder": "videos", "with_wandb": false } Logging data to ./logs/default_group/2024_04_10__07_32_36_eK6l57 /usr/local/lib/python3.10/dist-packages/gym/core.py:317: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set new_step_api=True to use new step API. This will be the default behaviour in future. deprecation( /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.num_tasks to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.num_tasks for environment variables or env.get_wrapper_attr('num_tasks') that will search the reminding wrappers. logger.warn( /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.get_active_env to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.get_active_env for environment variables or env.get_wrapper_attr('get_active_env') that will search the reminding wrappers. logger.warn( 2024-04-10 07:32:37 - Observations shape: (4, 84, 84, 3) 2024-04-10 07:32:37 - Actions shape: 12 /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.task_id to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.task_id for environment variables or env.get_wrapper_attr('task_id') that will search the reminding wrappers. logger.warn( /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.cur_seq_idx to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.cur_seq_idx for environment variables or env.get_wrapper_attr('cur_seq_idx') that will search the reminding wrappers. logger.warn( /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.name to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.name for environment variables or env.get_wrapper_attr('name') that will search the reminding wrappers. logger.warn( 2024-04-10 07:32:39 - Episode 1 duration: 0.9786. Buffer capacity: 0.63% (313/50000) /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.get_statistics to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.get_statistics for environment variables or env.get_wrapper_attr('get_statistics') that will search the reminding wrappers. logger.warn( /usr/local/lib/python3.10/dist-packages/gymnasium/core.py:311: UserWarning: WARN: env.clear_episode_statistics to get variables from other wrappers is deprecated and will be removed in v1.0, to get this variable you can do env.unwrapped.clear_episode_statistics for environment variables or env.get_wrapper_attr('clear_episode_statistics') that will search the reminding wrappers. logger.warn( 2024-04-10 07:32:40 - Episode 2 duration: 0.9469. Buffer capacity: 1.25% (626/50000) 2024-04-10 07:32:41 - Episode 3 duration: 0.9312. Buffer capacity: 1.88% (939/50000)

| train/actions/0 | 80 | | train/actions/1 | 71 | | train/actions/2 | 77 | | train/actions/3 | 87 | | train/actions/4 | 79 | | train/actions/5 | 69 | | train/actions/6 | 97 | | train/actions/7 | 85 | | train/actions/8 | 92 | | train/actions/9 | 89 | | train/actions/10 | 81 | | train/actions/11 | 93 | | epoch | 1 | | learning_rate | 0.001 | | train/return/avg | 22.3 | | train/return/std | 4.89 | | train/return/max | 29 | | train/return/min | 17.5 | | train/ep_length | 313 | | total_env_steps | 1e+03 | | current_task_steps | 1e+03 | | buffer_capacity | 1.25 | | train/loss_kl | nan | | train/loss_pi | nan | | train/loss_q1 | nan | | train/loss_q2 | nan | | train/episodes | 2 | | train/alpha/0 | nan | | train/loss_reg | nan | | train/health | 100 | | train/kills | 2.33 | | train/ammo | 69.7 | | train/movement | 34 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 3.12 | Scalar logging time: 0.0392305850982666 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:32:42 - Time elapsed for logging: 0.04345273971557617 2024-04-10 07:32:42 - Episode 4 duration: 0.6291. Buffer capacity: 2.50% (1252/50000) 2024-04-10 07:32:43 - Episode 5 duration: 0.7800. Buffer capacity: 3.13% (1565/50000) 2024-04-10 07:32:44 - Episode 6 duration: 0.7743. Buffer capacity: 3.76% (1878/50000)

| train/actions/0 | 71 | | train/actions/1 | 73 | | train/actions/2 | 89 | | train/actions/3 | 84 | | train/actions/4 | 83 | | train/actions/5 | 94 | | train/actions/6 | 84 | | train/actions/7 | 84 | | train/actions/8 | 78 | | train/actions/9 | 85 | | train/actions/10 | 93 | | train/actions/11 | 82 | | epoch | 2 | | learning_rate | 0.001 | | train/return/avg | 17.9 | | train/return/std | 6.1 | | train/return/max | 25 | | train/return/min | 10.1 | | train/ep_length | 313 | | total_env_steps | 2e+03 | | current_task_steps | 2e+03 | | buffer_capacity | 3.13 | | train/loss_kl | nan | | train/loss_pi | nan | | train/loss_q1 | nan | | train/loss_q2 | nan | | train/episodes | 5 | | train/alpha/0 | nan | | train/loss_reg | nan | | train/health | 100 | | train/kills | 2 | | train/ammo | 70.7 | | train/movement | 25.4 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 5.67 | Scalar logging time: 0.022348403930664062 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:32:44 - Time elapsed for logging: 0.023654937744140625 2024-04-10 07:32:45 - Episode 7 duration: 0.4842. Buffer capacity: 4.38% (2191/50000) 2024-04-10 07:32:45 - Episode 8 duration: 0.7810. Buffer capacity: 5.01% (2504/50000) 2024-04-10 07:32:46 - Episode 9 duration: 0.8011. Buffer capacity: 5.63% (2817/50000)

| train/actions/0 | 94 | | train/actions/1 | 90 | | train/actions/2 | 69 | | train/actions/3 | 77 | | train/actions/4 | 76 | | train/actions/5 | 72 | | train/actions/6 | 83 | | train/actions/7 | 89 | | train/actions/8 | 72 | | train/actions/9 | 94 | | train/actions/10 | 85 | | train/actions/11 | 99 | | epoch | 3 | | learning_rate | 0.001 | | train/return/avg | 14.4 | | train/return/std | 6.22 | | train/return/max | 19.5 | | train/return/min | 5.64 | | train/ep_length | 313 | | total_env_steps | 3e+03 | | current_task_steps | 3e+03 | | buffer_capacity | 5.01 | | train/loss_kl | nan | | train/loss_pi | nan | | train/loss_q1 | nan | | train/loss_q2 | nan | | train/episodes | 8 | | train/alpha/0 | nan | | train/loss_reg | nan | | train/health | 100 | | train/kills | 1.33 | | train/ammo | 68.7 | | train/movement | 24.8 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 8.24 | Scalar logging time: 0.020737409591674805 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:32:47 - Time elapsed for logging: 0.021990299224853516 2024-04-10 07:32:47 - Episode 10 duration: 0.3172. Buffer capacity: 6.26% (3130/50000) 2024-04-10 07:32:48 - Episode 11 duration: 0.7862. Buffer capacity: 6.89% (3443/50000) 2024-04-10 07:32:49 - Episode 12 duration: 0.8000. Buffer capacity: 7.51% (3756/50000)

| train/actions/0 | 97 | | train/actions/1 | 80 | | train/actions/2 | 95 | | train/actions/3 | 83 | | train/actions/4 | 83 | | train/actions/5 | 98 | | train/actions/6 | 82 | | train/actions/7 | 73 | | train/actions/8 | 72 | | train/actions/9 | 70 | | train/actions/10 | 81 | | train/actions/11 | 86 | | epoch | 4 | | learning_rate | 0.001 | | train/return/avg | 15.1 | | train/return/std | 8.24 | | train/return/max | 23.3 | | train/return/min | 3.8 | | train/ep_length | 313 | | total_env_steps | 4e+03 | | current_task_steps | 4e+03 | | buffer_capacity | 6.89 | | train/loss_kl | nan | | train/loss_pi | nan | | train/loss_q1 | nan | | train/loss_q2 | nan | | train/episodes | 11 | | train/alpha/0 | nan | | train/loss_reg | nan | | train/health | 100 | | train/kills | 1.67 | | train/ammo | 70 | | train/movement | 21.6 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 10.8 | Scalar logging time: 0.020810604095458984 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:32:49 - Time elapsed for logging: 0.022226572036743164 2024-04-10 07:32:49 - Episode 13 duration: 0.1740. Buffer capacity: 8.14% (4069/50000) 2024-04-10 07:32:50 - Episode 14 duration: 0.7823. Buffer capacity: 8.76% (4382/50000) 2024-04-10 07:32:51 - Episode 15 duration: 0.7740. Buffer capacity: 9.39% (4695/50000)

| train/actions/0 | 97 | | train/actions/1 | 89 | | train/actions/2 | 72 | | train/actions/3 | 76 | | train/actions/4 | 95 | | train/actions/5 | 89 | | train/actions/6 | 81 | | train/actions/7 | 79 | | train/actions/8 | 77 | | train/actions/9 | 67 | | train/actions/10 | 102 | | train/actions/11 | 76 | | epoch | 5 | | learning_rate | 0.001 | | train/return/avg | 18.7 | | train/return/std | 1.31 | | train/return/max | 20.2 | | train/return/min | 17 | | train/ep_length | 313 | | total_env_steps | 5e+03 | | current_task_steps | 5e+03 | | buffer_capacity | 8.76 | | train/loss_kl | nan | | train/loss_pi | nan | | train/loss_q1 | nan | | train/loss_q2 | nan | | train/episodes | 14 | | train/alpha/0 | nan | | train/loss_reg | nan | | train/health | 100 | | train/kills | 2 | | train/ammo | 67 | | train/movement | 27.8 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 13.4 | Scalar logging time: 0.027970314025878906 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:32:52 - Time elapsed for logging: 0.02982330322265625 2024-04-10 07:35:13 - Time elapsed for a policy update: 140.75769782066345 2024-04-10 07:35:13 - Episode 16 duration: 140.7804. Buffer capacity: 10.02% (5008/50000) 2024-04-10 07:35:14 - Episode 17 duration: 0.9168. Buffer capacity: 10.64% (5321/50000) 2024-04-10 07:37:31 - Time elapsed for a policy update: 137.16376042366028 2024-04-10 07:37:32 - Episode 18 duration: 138.0552. Buffer capacity: 11.27% (5634/50000) 2024-04-10 07:37:32 - Episode 19 duration: 0.7878. Buffer capacity: 11.89% (5947/50000)

| train/actions/0 | 90 | | train/actions/1 | 85 | | train/actions/2 | 82 | | train/actions/3 | 79 | | train/actions/4 | 85 | | train/actions/5 | 83 | | train/actions/6 | 94 | | train/actions/7 | 77 | | train/actions/8 | 84 | | train/actions/9 | 89 | | train/actions/10 | 76 | | train/actions/11 | 76 | | epoch | 6 | | learning_rate | 0.000997 | | train/return/avg | 13.4 | | train/return/std | 11.5 | | train/return/max | 33.3 | | train/return/min | 5.59 | | train/ep_length | 313 | | total_env_steps | 6e+03 | | current_task_steps | 6e+03 | | buffer_capacity | 11 | | train/loss_kl | 0 | | train/loss_pi | 3.27 | | train/loss_q1 | 0.833 | | train/loss_q2 | 0.939 | | train/episodes | 17.5 | | train/alpha/0 | 2.53 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 1.5 | | train/ammo | 67 | | train/movement | 19.1 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 294 | Scalar logging time: 0.023244857788085938 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:37:33 - Time elapsed for logging: 0.03106379508972168 2024-04-10 07:39:47 - Time elapsed for a policy update: 134.37921953201294 2024-04-10 07:39:48 - Episode 20 duration: 135.0360. Buffer capacity: 12.52% (6260/50000) 2024-04-10 07:42:00 - Time elapsed for a policy update: 131.47774195671082 2024-04-10 07:42:00 - Episode 21 duration: 132.2652. Buffer capacity: 13.15% (6573/50000) 2024-04-10 07:42:01 - Episode 22 duration: 0.7792. Buffer capacity: 13.77% (6886/50000)

| train/actions/0 | 91 | | train/actions/1 | 87 | | train/actions/2 | 85 | | train/actions/3 | 79 | | train/actions/4 | 76 | | train/actions/5 | 103 | | train/actions/6 | 82 | | train/actions/7 | 84 | | train/actions/8 | 72 | | train/actions/9 | 86 | | train/actions/10 | 68 | | train/actions/11 | 87 | | epoch | 7 | | learning_rate | 0.000995 | | train/return/avg | 12.1 | | train/return/std | 1.31 | | train/return/max | 14 | | train/return/min | 10.9 | | train/ep_length | 313 | | total_env_steps | 7e+03 | | current_task_steps | 7e+03 | | buffer_capacity | 13.1 | | train/loss_kl | 0 | | train/loss_pi | 6.3 | | train/loss_q1 | 0.264 | | train/loss_q2 | 0.262 | | train/episodes | 21 | | train/alpha/0 | 2.15 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 0.667 | | train/ammo | 71 | | train/movement | 28.3 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 563 | Scalar logging time: 0.022394180297851562 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:42:01 - Time elapsed for logging: 0.032239437103271484 2024-04-10 07:44:14 - Time elapsed for a policy update: 132.7520670890808 2024-04-10 07:44:14 - Episode 23 duration: 133.2505. Buffer capacity: 14.40% (7199/50000) 2024-04-10 07:46:28 - Time elapsed for a policy update: 133.12892532348633 2024-04-10 07:46:28 - Episode 24 duration: 133.9228. Buffer capacity: 15.02% (7512/50000) 2024-04-10 07:46:29 - Episode 25 duration: 0.9700. Buffer capacity: 15.65% (7825/50000)

| train/actions/0 | 87 | | train/actions/1 | 78 | | train/actions/2 | 83 | | train/actions/3 | 79 | | train/actions/4 | 85 | | train/actions/5 | 78 | | train/actions/6 | 78 | | train/actions/7 | 89 | | train/actions/8 | 105 | | train/actions/9 | 92 | | train/actions/10 | 76 | | train/actions/11 | 70 | | epoch | 8 | | learning_rate | 0.000992 | | train/return/avg | 18.8 | | train/return/std | 6.76 | | train/return/max | 28.2 | | train/return/min | 12.8 | | train/ep_length | 313 | | total_env_steps | 8e+03 | | current_task_steps | 8e+03 | | buffer_capacity | 15 | | train/loss_kl | 0 | | train/loss_pi | 8.42 | | train/loss_q1 | 0.357 | | train/loss_q2 | 0.344 | | train/episodes | 24 | | train/alpha/0 | 1.84 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 2.33 | | train/ammo | 68.7 | | train/movement | 22.8 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 831 | Scalar logging time: 0.02210068702697754 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:46:30 - Time elapsed for logging: 0.0294036865234375 2024-04-10 07:48:41 - Time elapsed for a policy update: 131.84115433692932 2024-04-10 07:48:42 - Episode 26 duration: 132.1760. Buffer capacity: 16.28% (8138/50000) 2024-04-10 07:48:43 - Episode 27 duration: 0.7789. Buffer capacity: 16.90% (8451/50000) 2024-04-10 07:50:55 - Time elapsed for a policy update: 131.83530712127686 2024-04-10 07:50:55 - Episode 28 duration: 132.6131. Buffer capacity: 17.53% (8764/50000)

| train/actions/0 | 86 | | train/actions/1 | 85 | | train/actions/2 | 100 | | train/actions/3 | 86 | | train/actions/4 | 70 | | train/actions/5 | 67 | | train/actions/6 | 83 | | train/actions/7 | 75 | | train/actions/8 | 83 | | train/actions/9 | 89 | | train/actions/10 | 94 | | train/actions/11 | 82 | | epoch | 9 | | learning_rate | 0.000989 | | train/return/avg | 21 | | train/return/std | 10 | | train/return/max | 33.5 | | train/return/min | 8.93 | | train/ep_length | 313 | | total_env_steps | 9e+03 | | current_task_steps | 9e+03 | | buffer_capacity | 16.9 | | train/loss_kl | 0 | | train/loss_pi | 10.2 | | train/loss_q1 | 0.317 | | train/loss_q2 | 0.314 | | train/episodes | 27 | | train/alpha/0 | 1.58 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 2 | | train/ammo | 68 | | train/movement | 35.3 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 1.1e+03 | Scalar logging time: 0.02165365219116211 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:50:56 - Time elapsed for logging: 0.0289609432220459 2024-04-10 07:53:06 - Time elapsed for a policy update: 130.4822416305542 2024-04-10 07:53:07 - Episode 29 duration: 130.6657. Buffer capacity: 18.15% (9077/50000) 2024-04-10 07:53:07 - Episode 30 duration: 0.7674. Buffer capacity: 18.78% (9390/50000) 2024-04-10 07:55:20 - Time elapsed for a policy update: 132.6010184288025 2024-04-10 07:55:21 - Episode 31 duration: 133.3647. Buffer capacity: 19.41% (9703/50000)

| train/actions/0 | 77 | | train/actions/1 | 81 | | train/actions/2 | 90 | | train/actions/3 | 69 | | train/actions/4 | 81 | | train/actions/5 | 74 | | train/actions/6 | 95 | | train/actions/7 | 93 | | train/actions/8 | 74 | | train/actions/9 | 82 | | train/actions/10 | 81 | | train/actions/11 | 103 | | epoch | 10 | | learning_rate | 0.000987 | | train/return/avg | 11.8 | | train/return/std | 6.47 | | train/return/max | 20.6 | | train/return/min | 5.28 | | train/ep_length | 313 | | total_env_steps | 1e+04 | | current_task_steps | 1e+04 | | buffer_capacity | 18.8 | | train/loss_kl | 0 | | train/loss_pi | 11.8 | | train/loss_q1 | 0.467 | | train/loss_q2 | 0.47 | | train/episodes | 30 | | train/alpha/0 | 1.37 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 1 | | train/ammo | 68.3 | | train/movement | 21.7 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 1.36e+03 | Scalar logging time: 0.023178577423095703 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:55:21 - Time elapsed for logging: 0.03100419044494629 2024-04-10 07:57:33 - Time elapsed for a policy update: 131.05696892738342 2024-04-10 07:57:33 - Episode 32 duration: 131.4975. Buffer capacity: 20.03% (10016/50000) 2024-04-10 07:57:36 - Episode 33 duration: 2.5918. Buffer capacity: 20.66% (10329/50000) 2024-04-10 07:59:47 - Time elapsed for a policy update: 130.77035212516785 2024-04-10 07:59:48 - Episode 34 duration: 132.6114. Buffer capacity: 21.28% (10642/50000) 2024-04-10 07:59:50 - Episode 35 duration: 1.8083. Buffer capacity: 21.91% (10955/50000)

| train/actions/0 | 94 | | train/actions/1 | 102 | | train/actions/2 | 97 | | train/actions/3 | 70 | | train/actions/4 | 80 | | train/actions/5 | 71 | | train/actions/6 | 88 | | train/actions/7 | 80 | | train/actions/8 | 82 | | train/actions/9 | 88 | | train/actions/10 | 73 | | train/actions/11 | 75 | | epoch | 11 | | learning_rate | 0.000984 | | train/return/avg | 20.7 | | train/return/std | 14.4 | | train/return/max | 45 | | train/return/min | 7.37 | | train/ep_length | 313 | | total_env_steps | 1.1e+04 | | current_task_steps | 1.1e+04 | | buffer_capacity | 21 | | train/loss_kl | 0 | | train/loss_pi | 13.3 | | train/loss_q1 | 0.412 | | train/loss_q2 | 0.434 | | train/episodes | 33.5 | | train/alpha/0 | 1.19 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 2.5 | | train/ammo | 69.5 | | train/movement | 26.4 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 1.63e+03 | Scalar logging time: 0.024907588958740234 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 07:59:51 - Time elapsed for logging: 0.0372774600982666 2024-04-10 08:02:04 - Time elapsed for a policy update: 132.96354508399963 2024-04-10 08:02:05 - Episode 36 duration: 134.5055. Buffer capacity: 22.54% (11268/50000) 2024-04-10 08:04:21 - Time elapsed for a policy update: 134.33584332466125 2024-04-10 08:04:21 - Episode 37 duration: 136.1606. Buffer capacity: 23.16% (11581/50000) 2024-04-10 08:04:23 - Episode 38 duration: 1.8366. Buffer capacity: 23.79% (11894/50000)

| train/actions/0 | 80 | | train/actions/1 | 90 | | train/actions/2 | 77 | | train/actions/3 | 81 | | train/actions/4 | 78 | | train/actions/5 | 76 | | train/actions/6 | 73 | | train/actions/7 | 88 | | train/actions/8 | 88 | | train/actions/9 | 73 | | train/actions/10 | 108 | | train/actions/11 | 88 | | epoch | 12 | | learning_rate | 0.000981 | | train/return/avg | 18.5 | | train/return/std | 10 | | train/return/max | 31 | | train/return/min | 6.42 | | train/ep_length | 313 | | total_env_steps | 1.2e+04 | | current_task_steps | 1.2e+04 | | buffer_capacity | 23.2 | | train/loss_kl | 0 | | train/loss_pi | 14.6 | | train/loss_q1 | 0.557 | | train/loss_q2 | 0.606 | | train/episodes | 37 | | train/alpha/0 | 1.04 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 2 | | train/ammo | 68.7 | | train/movement | 27.3 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 1.91e+03 | Scalar logging time: 0.021389484405517578 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 08:04:24 - Time elapsed for logging: 0.02859330177307129 2024-04-10 08:06:33 - Time elapsed for a policy update: 129.65999293327332 2024-04-10 08:06:35 - Episode 39 duration: 130.8193. Buffer capacity: 24.41% (12207/50000) 2024-04-10 08:08:47 - Time elapsed for a policy update: 130.4491686820984 2024-04-10 08:08:47 - Episode 40 duration: 132.8615. Buffer capacity: 25.04% (12520/50000) 2024-04-10 08:08:49 - Episode 41 duration: 1.7971. Buffer capacity: 25.67% (12833/50000)

| train/actions/0 | 62 | | train/actions/1 | 94 | | train/actions/2 | 80 | | train/actions/3 | 103 | | train/actions/4 | 79 | | train/actions/5 | 89 | | train/actions/6 | 81 | | train/actions/7 | 86 | | train/actions/8 | 90 | | train/actions/9 | 71 | | train/actions/10 | 82 | | train/actions/11 | 83 | | epoch | 13 | | learning_rate | 0.000978 | | train/return/avg | 14.3 | | train/return/std | 1.2 | | train/return/max | 16 | | train/return/min | 13.1 | | train/ep_length | 313 | | total_env_steps | 1.3e+04 | | current_task_steps | 1.3e+04 | | buffer_capacity | 25 | | train/loss_kl | 0 | | train/loss_pi | 15.8 | | train/loss_q1 | 0.486 | | train/loss_q2 | 0.625 | | train/episodes | 40 | | train/alpha/0 | 0.917 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 1.33 | | train/ammo | 72 | | train/movement | 24.6 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 2.17e+03 | Scalar logging time: 0.020668745040893555 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 08:08:50 - Time elapsed for logging: 0.027668476104736328 2024-04-10 08:11:00 - Time elapsed for a policy update: 129.80614304542542 2024-04-10 08:11:01 - Episode 42 duration: 131.0184. Buffer capacity: 26.29% (13146/50000) 2024-04-10 08:11:03 - Episode 43 duration: 2.2282. Buffer capacity: 26.92% (13459/50000) 2024-04-10 08:13:13 - Time elapsed for a policy update: 129.6457862854004 2024-04-10 08:13:15 - Episode 44 duration: 131.4255. Buffer capacity: 27.54% (13772/50000)

| train/actions/0 | 73 | | train/actions/1 | 70 | | train/actions/2 | 92 | | train/actions/3 | 91 | | train/actions/4 | 72 | | train/actions/5 | 83 | | train/actions/6 | 73 | | train/actions/7 | 74 | | train/actions/8 | 89 | | train/actions/9 | 100 | | train/actions/10 | 88 | | train/actions/11 | 95 | | epoch | 14 | | learning_rate | 0.000976 | | train/return/avg | 16 | | train/return/std | 7.33 | | train/return/max | 25.7 | | train/return/min | 7.94 | | train/ep_length | 313 | | total_env_steps | 1.4e+04 | | current_task_steps | 1.4e+04 | | buffer_capacity | 26.9 | | train/loss_kl | 0 | | train/loss_pi | 16.9 | | train/loss_q1 | 0.337 | | train/loss_q2 | 0.536 | | train/episodes | 43 | | train/alpha/0 | 0.808 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 1.33 | | train/ammo | 69.3 | | train/movement | 30 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 2.44e+03 | Scalar logging time: 0.021115779876708984 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 08:13:16 - Time elapsed for logging: 0.028461217880249023 2024-04-10 08:15:27 - Time elapsed for a policy update: 130.677344083786 2024-04-10 08:15:28 - Episode 45 duration: 131.3143. Buffer capacity: 28.17% (14085/50000) 2024-04-10 08:15:29 - Episode 46 duration: 1.8413. Buffer capacity: 28.80% (14398/50000) 2024-04-10 08:17:40 - Time elapsed for a policy update: 129.83444237709045 2024-04-10 08:17:41 - Episode 47 duration: 131.6774. Buffer capacity: 29.42% (14711/50000)

| train/actions/0 | 88 | | train/actions/1 | 80 | | train/actions/2 | 93 | | train/actions/3 | 82 | | train/actions/4 | 80 | | train/actions/5 | 59 | | train/actions/6 | 91 | | train/actions/7 | 80 | | train/actions/8 | 71 | | train/actions/9 | 97 | | train/actions/10 | 102 | | train/actions/11 | 77 | | epoch | 15 | | learning_rate | 0.000973 | | train/return/avg | 14 | | train/return/std | 12.2 | | train/return/max | 30.9 | | train/return/min | 2.97 | | train/ep_length | 313 | | total_env_steps | 1.5e+04 | | current_task_steps | 1.5e+04 | | buffer_capacity | 28.8 | | train/loss_kl | 0 | | train/loss_pi | 17.8 | | train/loss_q1 | 0.212 | | train/loss_q2 | 0.333 | | train/episodes | 46 | | train/alpha/0 | 0.715 | | train/loss_reg | 0 | | train/health | 100 | | train/kills | 1 | | train/ammo | 66.3 | | train/movement | 28.9 | | train/hits_taken | 0 | | train/success | 0 | | walltime | 2.7e+03 | Scalar logging time: 0.03137946128845215 Flushed tensorboard in 0.00 seconds

Wrote to output file in 0.00 seconds 2024-04-10 08:17:43 - Time elapsed for logging: 0.04452013969421387 2024-04-10 08:19:53 - Time elapsed for a policy update: 129.8362638950348 2024-04-10 08:19:53 - Episode 48 duration: 129.9821. Buffer capacity: 30.05% (15024/50000) 2024-04-10 08:19:55 - Episode 49 duration: 1.8883. Buffer capacity: 30.67% (15337/50000)

TTomilin commented 1 month ago

My apologies that I've completely missed this issue. In case this is still relevant, from the logs I cannot determine much to be fundamentally wrong, apart from that tensorflow was not able to register a GPU (either due to a lack of one or because CUDA is not properly integrated). This slows down the process a lot. From the logs you can see that it takes ~2 minutes for a single policy update. This is normally a matter of seconds on a GPU. Moreover, it may take more than 15 epochs before the agent has learned any meaningful behavior. I suggest ensuring that CUDA is properly installed to support using a GPU and then trying again.