Open ernestum opened 11 months ago
After reading through the paper, I am using the following hyperparameter search space:
parameter | search space |
---|---|
active_selection | True/False |
active_selection_oversampling | 2 to 10 |
comparison_queue_size | None or 1 to total_comparisons |
exploration_frac | 0.0 to 0.5 |
fragment_length | 1 to trajectory length |
gatherer_kwargs | |
temperature: 0 to 2 | |
discount_facrtor: 0.95 to 1 | |
sample: True/False | |
initial_comparison_frac | 0.01 to 1 |
num_iterations | 1 to 50 |
preference_model_kwargs | |
noise_prob: 0 to 0.1 | |
discount_factor: 0.95 to 1 | |
query_schedule | 'constant', 'hyperbolic', 'inverse_quadratic' |
total_comparisons | 1k (750 were enough in the paper) |
total_timesteps | 1e7 except for pendulum then 1e6 |
trajectory_generator_kwargs | |
exploration_frac: 0 to 0.1 | |
switch_prob: 0.1 to 1 | |
random_prob: 0.1 to 0.9 | |
transition_oversampling | 0.9 to 2 |
policy | pick a known good config from the zoo |
reward | when active_selection is true use the reward_ensemble named config. Otherwise use default. Note the default is just 32x32 while the paper uses 64x64 networks |
reward_trainer_kwargs | |
epochs = 1 to 10 | |
rl | pick a known good config from the zoo |
I consider fixing active_selection=True
and always using the reward ensemble because that turned out best in the paper.
This PR contains the changes necessary to run benchmarks for the Preferences Learning algorithm. It is also a place for planing and coordination notes on running the benchmarks.
astar
to see if everything runs without errors.Figure out how to properly run the tuning script on SLURM.Decided to not go the trouble with SLURM for now. It is too much trouble for too little gain. Maybe with slurm-launch.py and slurm-template.sh.Right now I think this is the best approach: Start with the
slurm-template.sh
and manually fill it. Call thattune_on_slurm.sh
. Don't useslurm-launch.py
. Make the env and the algo a parameter just like withrun_benchmark_on_slurm.sh
. Add atune_all_on_slurm.sh
just likerun_all_benchmarks_on_slurm.sh
. Follow this tutorial and this one (note: the way the head node address is determined does not seem to work!).