Open KonstantinRamthun opened 3 years ago
Command to run Bipedal Walker: srun -X --propagate=NPROC python -m multi_sample_factory_examples.train_gym_env --algo=APPO --use_rnn=True --encoder_type=mlp --encoder_subtype=mlp_mujoco --nonlinearity=relu --num_envs_per_worker=8 --policy_workers_per_policy=1 --rollout=32 --recurrence=32 --rnn_type=lstm --experiment_summaries_interval=100 --experiment=test_bipedal_walker_v3_iss14_2 --env=gym_BipedalWalker-v3 --train_dir=/work/grudelpg/Trainingsergebnisse --decorrelate_experience_max_seconds=0 --decorrelate_envs_on_one_worker=False
bipedeal walker.zip
High loss value: 30.1697 0.0194 30.1435 0.0068 (recommended to adjust the --reward_scale parameter)[0m fehler.txt
rl_recurrence_bipedal_walker_v3_test_2.zip
Experiment from October 25, 2021; Branch: main
Command to run Bipedal Walker: srun -X --propagate=NPROC python -m multi_sample_factory_examples.train_gym_env --algo=APPO --use_rnn=True --encoder_type=mlp --encoder_subtype=mlp_mujoco --nonlinearity=relu --num_envs_per_worker=8 --policy_workers_per_policy=1 --rollout=32 --recurrence=32 --rnn_type=lstm --experiment_summaries_interval=100 --experiment=test_bipedal_walker_v3_iss14_2 --env=gym_BipedalWalker-v3 --train_dir=/work/grudelpg/Trainingsergebnisse --decorrelate_experience_max_seconds=0 --decorrelate_envs_on_one_worker=False
Results: Tensorboard
bipedeal walker.zip
Reward
Action mean max
Action mean min
Errors:
High loss value: 30.1697 0.0194 30.1435 0.0068 (recommended to adjust the --reward_scale parameter)[0m fehler.txt
Whole log:
rl_recurrence_bipedal_walker_v3_test_2.zip