PG642 / multi-sample-factory

High throughput reinforcement learning on clusters
MIT License
2 stars 0 forks source link

Bipedal Walker (Continuous Actions) Test with MSF #17

Open KonstantinRamthun opened 2 years ago

Horrible22232 commented 2 years ago

Experiment from October 25, 2021; Branch: main

Command to run Bipedal Walker: srun -X --propagate=NPROC python -m multi_sample_factory_examples.train_gym_env --algo=APPO --use_rnn=True --encoder_type=mlp --encoder_subtype=mlp_mujoco --nonlinearity=relu --num_envs_per_worker=8 --policy_workers_per_policy=1 --rollout=32 --recurrence=32 --rnn_type=lstm --experiment_summaries_interval=100 --experiment=test_bipedal_walker_v3_iss14_2 --env=gym_BipedalWalker-v3 --train_dir=/work/grudelpg/Trainingsergebnisse --decorrelate_experience_max_seconds=0 --decorrelate_envs_on_one_worker=False

Results: Tensorboard

bipedeal walker.zip

Reward

chrome_ka9IxkuRW0

Action mean max

chrome_DwkgB9tAt3

Action mean min

chrome_xOldQWc73x

Errors:

High loss value: 30.1697 0.0194 30.1435 0.0068 (recommended to adjust the --reward_scale parameter) fehler.txt

Whole log:

rl_recurrence_bipedal_walker_v3_test_2.zip