ipsec commented 2 years ago

I'm trying to put alf to run a simple CartPole-v1 gym env using MuZero. I had tried many options without success.

ipsec commented 2 years ago

my running config

`import torch

import alf import alf.examples.muzero_conf from alf.utils import dist_utils from alf.utils.normalizers import ScalarAdaptiveNormalizer from alf.algorithms.mcts_models import SimpleMCTSModel from alf.algorithms.mcts_algorithm import MCTSAlgorithm, VisitSoftmaxTemperatureByProgress from alf.algorithms.data_transformer import RewardScaling from alf.optimizers import Adam from alf.networks import StableNormalProjectionNetwork

alf.config( "create_environment", env_name="CartPole-v1", num_parallel_environments=8)

alf.config('TrainerConfig', data_transformer_ctor=RewardScaling) alf.config('RewardScaling', scale=0.01)

alf.config( "SimplePredictionNet", continuous_projection_net_ctor=StableNormalProjectionNetwork)

alf.config( "SimpleMCTSModel", entropy_regularization=1e-4, num_sampled_actions=1)

alf.config( "MCTSAlgorithm", discount=0.99, num_simulations=10, root_dirichlet_alpha=0.5, root_exploration_fraction=0., pb_c_init=0.5, pb_c_base=19652, is_two_player_game=False, visit_softmax_temperature_fn=VisitSoftmaxTemperatureByProgress(), act_with_exploration_policy=True, learn_with_exploration_policy=True, search_with_exploration_policy=True, unexpanded_value_score='mean', expand_all_children=False, expand_all_root_children=True)

alf.config( "MuzeroAlgorithm", mcts_algorithm_ctor=MCTSAlgorithm, model_ctor=SimpleMCTSModel, num_unroll_steps=5, train_reward_function=True, td_steps=10, reward_normalizer=ScalarAdaptiveNormalizer(auto_update=False), reanalyze_ratio=0.5, target_update_period=1, target_update_tau=0.01)

alf.config("Agent", optimizer=Adam(lr=1e-3))

training config

alf.config( "TrainerConfig", unroll_length=10, mini_batch_size=256, debug_summaries=False, summarize_grads_and_vars=False, num_iterations=2500, num_checkpoints=5, evaluate=True, summary_interval=5, data_transformer_ctor=RewardScaling)`

emailweixu commented 2 years ago

Thanks for playing with our repo.

It's strange to me that you could run your config because there is bug affecting expand_all_root_children=True for discrete action. (https://github.com/HorizonRobotics/alf/pull/1076)

It turns out that the current examples/muzero_pendulum_conf.py can successfully train out of box after fixing the bug using the following command:

python -m alf.bin.train \
--conf muzero_pendulum_conf.py \
--root_dir ~/tmp/cartpole/ \
--conf_param create_environment.env_name="'CartPole-v1'" \
--conf_param SimpleMCTSModel.num_sampled_actions=None

Here is the training curve:

emailweixu commented 2 years ago

It turns out there are quite some variance among runs:

ipsec commented 2 years ago

Hi @emailweixu by the fast reply. Now it's working for me. Thanks

HorizonRobotics / alf

Minimal CartPole-v1 MuZero config file #1075

training config