Closed LucaVendruscolo closed 1 month ago
Hi @LucaVendruscolo, can you give us some more information? What operating system are you using? How many steps the agent had taken? Also, is this the first time that checkpoints are not saved? In other experiments, were checkpoints saved?
Thanks
Hello,
I have the program running on windows. I had the environment take around 60,000 steps.
I have never checked the logs before but I assume that the checkpoints were not saved on the previous runs as well.
These are some screenshots of my setup:
I changed it to save every 10,000 steps as that would make more sense for my setup
I have run the program again and it only seems to create these files after 90,000 steps
I did not understand which value you set for the checkpoint.every
parameter. Is it 100, 100000 or 100?
Can you please share all the config.yaml file? (The one in the version_0 folder).
Thanks
Sorry I meant to set it to 10,000 but I accidentally set it to 100,000 in that screenshot.
I reset everything an ran a new program and set the config to save every 1000 steps. The config.yaml file created in the logs though says it saves the checkpoint after 100,000 steps which is not the same as whats in the default.yaml
sheeprl\logs\runs\dreamer_v3\BallMaze\2024-05-04_22-26-41_dreamer_v3_BallMaze_42.hydra\config.yaml:
num_threads: 1
float32_matmul_precision: high
dry_run: false
seed: 42
torch_use_deterministic_algorithms: false
torch_backends_cudnn_benchmark: true
torch_backends_cudnn_deterministic: false
cublas_workspace_config: null
exp_name: ${algo.name}_${env.id}
run_name: ${now:%Y-%m-%d_%H-%M-%S}_${exp_name}_${seed}
root_dir: ${algo.name}/${env.id}
algo:
name: dreamer_v3
total_steps: 5000000
per_rank_batch_size: 16
run_test: true
cnn_keys:
encoder: []
decoder: []
mlp_keys:
encoder:
- position
- velocity
decoder: ${algo.mlp_keys.encoder}
world_model:
optimizer:
_target_: torch.optim.Adam
lr: 0.0001
eps: 1.0e-08
weight_decay: 0
betas:
- 0.9
- 0.999
discrete_size: 32
stochastic_size: 32
kl_dynamic: 0.5
kl_representation: 0.1
kl_free_nats: 1.0
kl_regularizer: 1.0
continue_scale_factor: 1.0
clip_gradients: 1000.0
decoupled_rssm: false
learnable_initial_recurrent_state: true
encoder:
cnn_channels_multiplier: 32
cnn_act: ${algo.cnn_act}
dense_act: ${algo.dense_act}
mlp_layers: ${algo.mlp_layers}
cnn_layer_norm: ${algo.cnn_layer_norm}
mlp_layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
recurrent_model:
recurrent_state_size: 512
layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
transition_model:
hidden_size: 512
dense_act: ${algo.dense_act}
layer_norm: ${algo.mlp_layer_norm}
representation_model:
hidden_size: 512
dense_act: ${algo.dense_act}
layer_norm: ${algo.mlp_layer_norm}
observation_model:
cnn_channels_multiplier: ${algo.world_model.encoder.cnn_channels_multiplier}
cnn_act: ${algo.cnn_act}
dense_act: ${algo.dense_act}
mlp_layers: ${algo.mlp_layers}
cnn_layer_norm: ${algo.cnn_layer_norm}
mlp_layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
reward_model:
dense_act: ${algo.dense_act}
mlp_layers: ${algo.mlp_layers}
layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
bins: 255
discount_model:
learnable: true
dense_act: ${algo.dense_act}
mlp_layers: ${algo.mlp_layers}
layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
actor:
optimizer:
_target_: torch.optim.Adam
lr: 8.0e-05
eps: 1.0e-05
weight_decay: 0
betas:
- 0.9
- 0.999
cls: sheeprl.algos.dreamer_v3.agent.Actor
ent_coef: 0.0003
min_std: 0.1
max_std: 1.0
init_std: 2.0
dense_act: ${algo.dense_act}
mlp_layers: ${algo.mlp_layers}
layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
clip_gradients: 100.0
unimix: ${algo.unimix}
action_clip: 1.0
moments:
decay: 0.99
max: 1.0
percentile:
low: 0.05
high: 0.95
critic:
optimizer:
_target_: torch.optim.Adam
lr: 8.0e-05
eps: 1.0e-05
weight_decay: 0
betas:
- 0.9
- 0.999
dense_act: ${algo.dense_act}
mlp_layers: ${algo.mlp_layers}
layer_norm: ${algo.mlp_layer_norm}
dense_units: ${algo.dense_units}
per_rank_target_network_update_freq: 1
tau: 0.02
bins: 255
clip_gradients: 100.0
gamma: 0.996996996996997
lmbda: 0.95
horizon: 15
replay_ratio: 1
learning_starts: 64
per_rank_pretrain_steps: 0
per_rank_sequence_length: 64
cnn_layer_norm:
cls: sheeprl.models.models.LayerNormChannelLast
kw:
eps: 0.001
mlp_layer_norm:
cls: sheeprl.models.models.LayerNorm
kw:
eps: 0.001
dense_units: 512
mlp_layers: 2
dense_act: torch.nn.SiLU
cnn_act: torch.nn.SiLU
unimix: 0.01
hafner_initialization: true
player:
discrete_size: ${algo.world_model.discrete_size}
buffer:
size: 1000000
memmap: true
validate_args: false
from_numpy: false
checkpoint: false
checkpoint:
every: 100000
resume_from: null
save_last: true
keep_last: 5
distribution:
validate_args: false
type: auto
env:
id: BallMaze
num_envs: 1
frame_stack: 1
sync_env: false
screen_size: 64
action_repeat: 1
grayscale: false
clip_rewards: false
capture_video: false
frame_stack_dilation: 1
max_episode_steps: null
reward_as_observation: false
wrapper:
_target_: sheeprl.envs.BallGame.BallMazeEnv
render_mode: rgb_array
fabric:
_target_: lightning.fabric.Fabric
devices: 1
num_nodes: 1
strategy: auto
accelerator: gpu
precision: 32-true
callbacks:
- _target_: sheeprl.utils.callback.CheckpointCallback
keep_last: ${checkpoint.keep_last}
metric:
log_every: 5000
disable_timer: false
log_level: 1
sync_on_compute: false
aggregator:
_target_: sheeprl.utils.metric.MetricAggregator
raise_on_missing: false
metrics:
Rewards/rew_avg:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Game/ep_len_avg:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/world_model_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/value_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/policy_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/observation_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/reward_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/state_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Loss/continue_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
State/kl:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
State/post_entropy:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
State/prior_entropy:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Grads/world_model:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Grads/actor:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
Grads/critic:
_target_: torchmetrics.MeanMetric
sync_on_compute: ${metric.sync_on_compute}
logger:
_target_: lightning.fabric.loggers.TensorBoardLogger
name: ${run_name}
root_dir: logs/runs/${root_dir}
version: null
default_hp_metric: true
prefix: ''
sub_dir: null
model_manager:
disabled: true
models:
world_model:
model_name: ${exp_name}_world_model
description: DreamerV3 World Model used in ${env.id} Environment
tags: {}
actor:
model_name: ${exp_name}_actor
description: DreamerV3 Actor used in ${env.id} Environment
tags: {}
critic:
model_name: ${exp_name}_critic
description: DreamerV3 Critic used in ${env.id} Environment
tags: {}
target_critic:
model_name: ${exp_name}_target_critic
description: DreamerV3 Target Critic used in ${env.id} Environment
tags: {}
moments:
model_name: ${exp_name}_moments
description: DreamerV3 Moments used in ${env.id} Environment
tags: {}
Ok, now I get it.
The configurations are structured in this way: for each component, you have configurations (algorithms, environment, checkpoints, buffers, ...) by default or ad hoc by type of environment/algorithm.
All configurations are combined in the config file in the exp
folder, in this way, you can create several experiments and run them easily.
Everything in the config file inside the exp
folder overwrites the default values in the other configurations (algorithm, environment, checkpoint, ...).
To apply correctly your change, you should remove the configuration concerning the checkpoint (or change that value) from the file in the exp
folder: https://github.com/Eclectic-Sheep/sheeprl/blob/96040b1b1a836901680088c1c5f75679e94165de/sheeprl/configs/exp/dreamer_v3.yaml#L21
Thank you so much!
Hello, I am trying to resume but I am getting an error message saying the config file doesn't exist but it does.
(RL) G:\SheepRL divide by 0 fix\sheeprl>python sheeprl.py exp=dreamer_v3 env=BallGame algo.mlp_keys.encoder=[position,QR_position,ball_speed,QR_speed] algo.mlp_keys.encoder=[position,QR_position,ball_speed,QR_speed] algo.cnn_keys.encoder=[] algo.cnn_keys.decoder=[
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sheeprl.py 4 <module>
run()
main.py 90 decorated_main
_run_hydra(
utils.py 394 _run_hydra
_run_app(
utils.py 457 _run_app
run_and_report(
utils.py 222 run_and_report
raise ex
utils.py 219 run_and_report
return func()
utils.py 458 <lambda>
lambda: hydra.run(
hydra.py 105 run
cfg = self.compose_config(
hydra.py 594 compose_config
cfg = self.config_loader.load_configuration(
config_loader_impl.py 142 load_configuration
return self._load_configuration_impl(
config_loader_impl.py 244 _load_configuration_impl
parsed_overrides, caching_repo = self._parse_overrides_and_create_caching_repo(
config_loader_impl.py 228 _parse_overrides_and_create_caching_repo
parsed_overrides = parser.parse_overrides(overrides=overrides)
overrides_parser.py 96 parse_overrides
raise OverrideParseException(
hydra.errors.OverrideParseException:
no viable alternative at input '['
See https://hydra.cc/docs/1.2/advanced/override_grammar/basic for details
(RL) G:\SheepRL divide by 0 fix\sheeprl>
(RL) G:\SheepRL divide by 0 fix\sheeprl>python sheeprl.py exp=dreamer_v3
CONFIG
├── algo
│ └── name: dreamer_v3
│ total_steps: 5000000
│ per_rank_batch_size: 16
│ run_test: true
│ cnn_keys:
│ encoder:
│ - rgb
│ decoder:
│ - rgb
│ mlp_keys:
│ encoder: []
│ decoder: []
│ world_model:
│ optimizer:
│ _target_: torch.optim.Adam
│ lr: 0.0001
│ eps: 1.0e-08
│ weight_decay: 0
│ betas:
│ - 0.9
│ - 0.999
│ discrete_size: 32
│ stochastic_size: 32
│ kl_dynamic: 0.5
│ kl_representation: 0.1
│ kl_free_nats: 1.0
│ kl_regularizer: 1.0
│ continue_scale_factor: 1.0
│ clip_gradients: 1000.0
│ decoupled_rssm: false
│ learnable_initial_recurrent_state: true
│ encoder:
│ cnn_channels_multiplier: 32
│ cnn_act: torch.nn.SiLU
│ dense_act: torch.nn.SiLU
│ mlp_layers: 2
│ cnn_layer_norm:
│ cls: sheeprl.models.models.LayerNormChannelLast
│ kw:
│ eps: 0.001
│ mlp_layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ recurrent_model:
│ recurrent_state_size: 512
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ transition_model:
│ hidden_size: 512
│ dense_act: torch.nn.SiLU
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ representation_model:
│ hidden_size: 512
│ dense_act: torch.nn.SiLU
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ observation_model:
│ cnn_channels_multiplier: 32
│ cnn_act: torch.nn.SiLU
│ dense_act: torch.nn.SiLU
│ mlp_layers: 2
│ cnn_layer_norm:
│ cls: sheeprl.models.models.LayerNormChannelLast
│ kw:
│ eps: 0.001
│ mlp_layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ reward_model:
│ dense_act: torch.nn.SiLU
│ mlp_layers: 2
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ bins: 255
│ discount_model:
│ learnable: true
│ dense_act: torch.nn.SiLU
│ mlp_layers: 2
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ actor:
│ optimizer:
│ _target_: torch.optim.Adam
│ lr: 8.0e-05
│ eps: 1.0e-05
│ weight_decay: 0
│ betas:
│ - 0.9
│ - 0.999
│ cls: sheeprl.algos.dreamer_v3.agent.Actor
│ ent_coef: 0.0003
│ min_std: 0.1
│ max_std: 1.0
│ init_std: 2.0
│ dense_act: torch.nn.SiLU
│ mlp_layers: 2
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ clip_gradients: 100.0
│ unimix: 0.01
│ action_clip: 1.0
│ moments:
│ decay: 0.99
│ max: 1.0
│ percentile:
│ low: 0.05
│ high: 0.95
│ critic:
│ optimizer:
│ _target_: torch.optim.Adam
│ lr: 8.0e-05
│ eps: 1.0e-05
│ weight_decay: 0
│ betas:
│ - 0.9
│ - 0.999
│ dense_act: torch.nn.SiLU
│ mlp_layers: 2
│ layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ per_rank_target_network_update_freq: 1
│ tau: 0.02
│ bins: 255
│ clip_gradients: 100.0
│ gamma: 0.996996996996997
│ lmbda: 0.95
│ horizon: 15
│ replay_ratio: 1
│ learning_starts: 1024
│ per_rank_pretrain_steps: 0
│ per_rank_sequence_length: 64
│ cnn_layer_norm:
│ cls: sheeprl.models.models.LayerNormChannelLast
│ kw:
│ eps: 0.001
│ mlp_layer_norm:
│ cls: sheeprl.models.models.LayerNorm
│ kw:
│ eps: 0.001
│ dense_units: 512
│ mlp_layers: 2
│ dense_act: torch.nn.SiLU
│ cnn_act: torch.nn.SiLU
│ unimix: 0.01
│ hafner_initialization: true
│ player:
│ discrete_size: 32
│
├── buffer
│ └── size: 1000000
│ memmap: true
│ validate_args: false
│ from_numpy: false
│ checkpoint: false
│
├── checkpoint
│ └── every: 10000
│ resume_from: sheeprl\logs\runs\dreamer_v3\BallGame\2024-05-07_04-45-45_dreamer_v3_BallGame_42\version_0\checkpoint\ckpt_460000_0.ckpt
│ save_last: true
│ keep_last: 5
│
├── env
│ └── id: PongNoFrameskip-v4
│ num_envs: 4
│ frame_stack: 1
│ sync_env: false
│ screen_size: 64
│ action_repeat: 4
│ grayscale: false
│ clip_rewards: false
│ capture_video: true
│ frame_stack_dilation: 1
│ max_episode_steps: 27000
│ reward_as_observation: false
│ wrapper:
│ _target_: gymnasium.wrappers.AtariPreprocessing
│ env:
│ _target_: gymnasium.make
│ id: PongNoFrameskip-v4
│ render_mode: rgb_array
│ noop_max: 30
│ terminal_on_life_loss: false
│ frame_skip: 4
│ screen_size: 64
│ grayscale_obs: false
│ scale_obs: false
│ grayscale_newaxis: true
│
├── fabric
│ └── _target_: lightning.fabric.Fabric
│ devices: 1
│ num_nodes: 1
│ strategy: auto
│ accelerator: gpu
│ precision: 32-true
│ callbacks:
│ - _target_: sheeprl.utils.callback.CheckpointCallback
│ keep_last: 5
│
└── metric
└── log_every: 5000
disable_timer: false
log_level: 1
sync_on_compute: false
aggregator:
_target_: sheeprl.utils.metric.MetricAggregator
raise_on_missing: false
metrics:
Rewards/rew_avg:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Game/ep_len_avg:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/world_model_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/value_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/policy_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/observation_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/reward_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/state_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Loss/continue_loss:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
State/kl:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
State/post_entropy:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
State/prior_entropy:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Grads/world_model:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Grads/actor:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
Grads/critic:
_target_: torchmetrics.MeanMetric
sync_on_compute: false
logger:
_target_: lightning.fabric.loggers.TensorBoardLogger
name: 2024-05-10_21-38-24_dreamer_v3_PongNoFrameskip-v4_42
root_dir: logs/runs/dreamer_v3/PongNoFrameskip-v4
version: null
default_hp_metric: true
prefix: ''
sub_dir: null
Error executing job with overrides: ['exp=dreamer_v3']
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sheeprl.py 4 <module>
run()
main.py 90 decorated_main
_run_hydra(
utils.py 394 _run_hydra
_run_app(
utils.py 457 _run_app
run_and_report(
utils.py 222 run_and_report
raise ex
utils.py 219 run_and_report
return func()
utils.py 458 <lambda>
lambda: hydra.run(
hydra.py 132 run
_ = ret.return_value
utils.py 260 return_value
raise self._return_value
utils.py 186 run_job
ret.return_value = task_function(task_cfg)
cli.py 349 run
cfg = resume_from_checkpoint(cfg)
cli.py 25 resume_from_checkpoint
old_cfg = OmegaConf.load(ckpt_path.parent.parent / "config.yaml")
omegaconf.py 189 load
with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:
FileNotFoundError:
2
No such file or directory
G:\SheepRL divide by 0 fix\sheeprl\sheeprl\logs\runs\dreamer_v3\BallGame\2024-05-07_04-45-45_dreamer_v3_BallGame_42\version_0\config.yaml
Hi @LucaVendruscolo,
I think the problem is the checkpoint.resume_from
argument. Since you are in the G:\SheepRL divide by 0 fix\sheeprl
folder, the checkpoint path should be: logs\\runs\\...
without the `sheeprl' at the beginning.
Let me know if it works, thanks
That was a stupid mistake sorry about that. thank you so much!
Hello,
I have been reading through #273 and #187 but I couldn't understand how to resume from a checkpoint because my logs don't have a .ckpt file in them.
I have the config file set up as: every: 100 resume_from: null save_last: True keep_last: 5
Is there a simple way to resume training from where you last left off?