Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
274 stars 26 forks source link

Dreamer v3 resuming #277

Closed LucaVendruscolo closed 1 month ago

LucaVendruscolo commented 2 months ago

Hello,

I have been reading through #273 and #187 but I couldn't understand how to resume from a checkpoint because my logs don't have a .ckpt file in them. image

I have the config file set up as: every: 100 resume_from: null save_last: True keep_last: 5

Is there a simple way to resume training from where you last left off?

michele-milesi commented 2 months ago

Hi @LucaVendruscolo, can you give us some more information? What operating system are you using? How many steps the agent had taken? Also, is this the first time that checkpoints are not saved? In other experiments, were checkpoints saved?

Thanks

LucaVendruscolo commented 2 months ago

Hello,

I have the program running on windows. I had the environment take around 60,000 steps.

I have never checked the logs before but I assume that the checkpoints were not saved on the previous runs as well.

These are some screenshots of my setup:

image I changed it to save every 10,000 steps as that would make more sense for my setup image image

I have run the program again and it only seems to create these files after 90,000 steps image

michele-milesi commented 1 month ago

I did not understand which value you set for the checkpoint.every parameter. Is it 100, 100000 or 100? Can you please share all the config.yaml file? (The one in the version_0 folder).

Thanks

LucaVendruscolo commented 1 month ago

Sorry I meant to set it to 10,000 but I accidentally set it to 100,000 in that screenshot.

I reset everything an ran a new program and set the config to save every 1000 steps. The config.yaml file created in the logs though says it saves the checkpoint after 100,000 steps which is not the same as whats in the default.yaml

image image

sheeprl\logs\runs\dreamer_v3\BallMaze\2024-05-04_22-26-41_dreamer_v3_BallMaze_42.hydra\config.yaml:

num_threads: 1
float32_matmul_precision: high
dry_run: false
seed: 42
torch_use_deterministic_algorithms: false
torch_backends_cudnn_benchmark: true
torch_backends_cudnn_deterministic: false
cublas_workspace_config: null
exp_name: ${algo.name}_${env.id}
run_name: ${now:%Y-%m-%d_%H-%M-%S}_${exp_name}_${seed}
root_dir: ${algo.name}/${env.id}
algo:
  name: dreamer_v3
  total_steps: 5000000
  per_rank_batch_size: 16
  run_test: true
  cnn_keys:
    encoder: []
    decoder: []
  mlp_keys:
    encoder:
    - position
    - velocity
    decoder: ${algo.mlp_keys.encoder}
  world_model:
    optimizer:
      _target_: torch.optim.Adam
      lr: 0.0001
      eps: 1.0e-08
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    discrete_size: 32
    stochastic_size: 32
    kl_dynamic: 0.5
    kl_representation: 0.1
    kl_free_nats: 1.0
    kl_regularizer: 1.0
    continue_scale_factor: 1.0
    clip_gradients: 1000.0
    decoupled_rssm: false
    learnable_initial_recurrent_state: true
    encoder:
      cnn_channels_multiplier: 32
      cnn_act: ${algo.cnn_act}
      dense_act: ${algo.dense_act}
      mlp_layers: ${algo.mlp_layers}
      cnn_layer_norm: ${algo.cnn_layer_norm}
      mlp_layer_norm: ${algo.mlp_layer_norm}
      dense_units: ${algo.dense_units}
    recurrent_model:
      recurrent_state_size: 512
      layer_norm: ${algo.mlp_layer_norm}
      dense_units: ${algo.dense_units}
    transition_model:
      hidden_size: 512
      dense_act: ${algo.dense_act}
      layer_norm: ${algo.mlp_layer_norm}
    representation_model:
      hidden_size: 512
      dense_act: ${algo.dense_act}
      layer_norm: ${algo.mlp_layer_norm}
    observation_model:
      cnn_channels_multiplier: ${algo.world_model.encoder.cnn_channels_multiplier}
      cnn_act: ${algo.cnn_act}
      dense_act: ${algo.dense_act}
      mlp_layers: ${algo.mlp_layers}
      cnn_layer_norm: ${algo.cnn_layer_norm}
      mlp_layer_norm: ${algo.mlp_layer_norm}
      dense_units: ${algo.dense_units}
    reward_model:
      dense_act: ${algo.dense_act}
      mlp_layers: ${algo.mlp_layers}
      layer_norm: ${algo.mlp_layer_norm}
      dense_units: ${algo.dense_units}
      bins: 255
    discount_model:
      learnable: true
      dense_act: ${algo.dense_act}
      mlp_layers: ${algo.mlp_layers}
      layer_norm: ${algo.mlp_layer_norm}
      dense_units: ${algo.dense_units}
  actor:
    optimizer:
      _target_: torch.optim.Adam
      lr: 8.0e-05
      eps: 1.0e-05
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    cls: sheeprl.algos.dreamer_v3.agent.Actor
    ent_coef: 0.0003
    min_std: 0.1
    max_std: 1.0
    init_std: 2.0
    dense_act: ${algo.dense_act}
    mlp_layers: ${algo.mlp_layers}
    layer_norm: ${algo.mlp_layer_norm}
    dense_units: ${algo.dense_units}
    clip_gradients: 100.0
    unimix: ${algo.unimix}
    action_clip: 1.0
    moments:
      decay: 0.99
      max: 1.0
      percentile:
        low: 0.05
        high: 0.95
  critic:
    optimizer:
      _target_: torch.optim.Adam
      lr: 8.0e-05
      eps: 1.0e-05
      weight_decay: 0
      betas:
      - 0.9
      - 0.999
    dense_act: ${algo.dense_act}
    mlp_layers: ${algo.mlp_layers}
    layer_norm: ${algo.mlp_layer_norm}
    dense_units: ${algo.dense_units}
    per_rank_target_network_update_freq: 1
    tau: 0.02
    bins: 255
    clip_gradients: 100.0
  gamma: 0.996996996996997
  lmbda: 0.95
  horizon: 15
  replay_ratio: 1
  learning_starts: 64
  per_rank_pretrain_steps: 0
  per_rank_sequence_length: 64
  cnn_layer_norm:
    cls: sheeprl.models.models.LayerNormChannelLast
    kw:
      eps: 0.001
  mlp_layer_norm:
    cls: sheeprl.models.models.LayerNorm
    kw:
      eps: 0.001
  dense_units: 512
  mlp_layers: 2
  dense_act: torch.nn.SiLU
  cnn_act: torch.nn.SiLU
  unimix: 0.01
  hafner_initialization: true
  player:
    discrete_size: ${algo.world_model.discrete_size}
buffer:
  size: 1000000
  memmap: true
  validate_args: false
  from_numpy: false
  checkpoint: false
checkpoint:
  every: 100000
  resume_from: null
  save_last: true
  keep_last: 5
distribution:
  validate_args: false
  type: auto
env:
  id: BallMaze
  num_envs: 1
  frame_stack: 1
  sync_env: false
  screen_size: 64
  action_repeat: 1
  grayscale: false
  clip_rewards: false
  capture_video: false
  frame_stack_dilation: 1
  max_episode_steps: null
  reward_as_observation: false
  wrapper:
    _target_: sheeprl.envs.BallGame.BallMazeEnv
    render_mode: rgb_array
fabric:
  _target_: lightning.fabric.Fabric
  devices: 1
  num_nodes: 1
  strategy: auto
  accelerator: gpu
  precision: 32-true
  callbacks:
  - _target_: sheeprl.utils.callback.CheckpointCallback
    keep_last: ${checkpoint.keep_last}
metric:
  log_every: 5000
  disable_timer: false
  log_level: 1
  sync_on_compute: false
  aggregator:
    _target_: sheeprl.utils.metric.MetricAggregator
    raise_on_missing: false
    metrics:
      Rewards/rew_avg:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Game/ep_len_avg:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/world_model_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/value_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/policy_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/observation_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/reward_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/state_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Loss/continue_loss:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      State/kl:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      State/post_entropy:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      State/prior_entropy:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Grads/world_model:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Grads/actor:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
      Grads/critic:
        _target_: torchmetrics.MeanMetric
        sync_on_compute: ${metric.sync_on_compute}
  logger:
    _target_: lightning.fabric.loggers.TensorBoardLogger
    name: ${run_name}
    root_dir: logs/runs/${root_dir}
    version: null
    default_hp_metric: true
    prefix: ''
    sub_dir: null
model_manager:
  disabled: true
  models:
    world_model:
      model_name: ${exp_name}_world_model
      description: DreamerV3 World Model used in ${env.id} Environment
      tags: {}
    actor:
      model_name: ${exp_name}_actor
      description: DreamerV3 Actor used in ${env.id} Environment
      tags: {}
    critic:
      model_name: ${exp_name}_critic
      description: DreamerV3 Critic used in ${env.id} Environment
      tags: {}
    target_critic:
      model_name: ${exp_name}_target_critic
      description: DreamerV3 Target Critic used in ${env.id} Environment
      tags: {}
    moments:
      model_name: ${exp_name}_moments
      description: DreamerV3 Moments used in ${env.id} Environment
      tags: {}
michele-milesi commented 1 month ago

Ok, now I get it.

The configurations are structured in this way: for each component, you have configurations (algorithms, environment, checkpoints, buffers, ...) by default or ad hoc by type of environment/algorithm. All configurations are combined in the config file in the exp folder, in this way, you can create several experiments and run them easily. Everything in the config file inside the exp folder overwrites the default values in the other configurations (algorithm, environment, checkpoint, ...). To apply correctly your change, you should remove the configuration concerning the checkpoint (or change that value) from the file in the exp folder: https://github.com/Eclectic-Sheep/sheeprl/blob/96040b1b1a836901680088c1c5f75679e94165de/sheeprl/configs/exp/dreamer_v3.yaml#L21

LucaVendruscolo commented 1 month ago

Thank you so much!

LucaVendruscolo commented 1 month ago

Hello, I am trying to resume but I am getting an error message saying the config file doesn't exist but it does.

image

image

(RL) G:\SheepRL divide by 0 fix\sheeprl>python sheeprl.py exp=dreamer_v3 env=BallGame algo.mlp_keys.encoder=[position,QR_position,ball_speed,QR_speed] algo.mlp_keys.encoder=[position,QR_position,ball_speed,QR_speed] algo.cnn_keys.encoder=[] algo.cnn_keys.decoder=[

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sheeprl.py 4 <module>
run()

main.py 90 decorated_main
_run_hydra(

utils.py 394 _run_hydra
_run_app(

utils.py 457 _run_app
run_and_report(

utils.py 222 run_and_report
raise ex

utils.py 219 run_and_report
return func()

utils.py 458 <lambda>
lambda: hydra.run(

hydra.py 105 run
cfg = self.compose_config(

hydra.py 594 compose_config
cfg = self.config_loader.load_configuration(

config_loader_impl.py 142 load_configuration
return self._load_configuration_impl(

config_loader_impl.py 244 _load_configuration_impl
parsed_overrides, caching_repo = self._parse_overrides_and_create_caching_repo(

config_loader_impl.py 228 _parse_overrides_and_create_caching_repo
parsed_overrides = parser.parse_overrides(overrides=overrides)

overrides_parser.py 96 parse_overrides
raise OverrideParseException(

hydra.errors.OverrideParseException:
no viable alternative at input '['
See https://hydra.cc/docs/1.2/advanced/override_grammar/basic for details

(RL) G:\SheepRL divide by 0 fix\sheeprl>
(RL) G:\SheepRL divide by 0 fix\sheeprl>python sheeprl.py exp=dreamer_v3             
CONFIG
├── algo
│   └── name: dreamer_v3
│       total_steps: 5000000
│       per_rank_batch_size: 16
│       run_test: true
│       cnn_keys:
│         encoder:
│         - rgb
│         decoder:
│         - rgb
│       mlp_keys:
│         encoder: []
│         decoder: []
│       world_model:
│         optimizer:
│           _target_: torch.optim.Adam
│           lr: 0.0001
│           eps: 1.0e-08
│           weight_decay: 0
│           betas:
│           - 0.9
│           - 0.999
│         discrete_size: 32
│         stochastic_size: 32
│         kl_dynamic: 0.5
│         kl_representation: 0.1
│         kl_free_nats: 1.0
│         kl_regularizer: 1.0
│         continue_scale_factor: 1.0
│         clip_gradients: 1000.0
│         decoupled_rssm: false
│         learnable_initial_recurrent_state: true
│         encoder:
│           cnn_channels_multiplier: 32
│           cnn_act: torch.nn.SiLU
│           dense_act: torch.nn.SiLU
│           mlp_layers: 2
│           cnn_layer_norm:
│             cls: sheeprl.models.models.LayerNormChannelLast
│             kw:
│               eps: 0.001
│           mlp_layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│           dense_units: 512
│         recurrent_model:
│           recurrent_state_size: 512
│           layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│           dense_units: 512
│         transition_model:
│           hidden_size: 512
│           dense_act: torch.nn.SiLU
│           layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│         representation_model:
│           hidden_size: 512
│           dense_act: torch.nn.SiLU
│           layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│         observation_model:
│           cnn_channels_multiplier: 32
│           cnn_act: torch.nn.SiLU
│           dense_act: torch.nn.SiLU
│           mlp_layers: 2
│           cnn_layer_norm:
│             cls: sheeprl.models.models.LayerNormChannelLast
│             kw:
│               eps: 0.001
│           mlp_layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│           dense_units: 512
│         reward_model:
│           dense_act: torch.nn.SiLU
│           mlp_layers: 2
│           layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│           dense_units: 512
│           bins: 255
│         discount_model:
│           learnable: true
│           dense_act: torch.nn.SiLU
│           mlp_layers: 2
│           layer_norm:
│             cls: sheeprl.models.models.LayerNorm
│             kw:
│               eps: 0.001
│           dense_units: 512
│       actor:
│         optimizer:
│           _target_: torch.optim.Adam
│           lr: 8.0e-05
│           eps: 1.0e-05
│           weight_decay: 0
│           betas:
│           - 0.9
│           - 0.999
│         cls: sheeprl.algos.dreamer_v3.agent.Actor
│         ent_coef: 0.0003
│         min_std: 0.1
│         max_std: 1.0
│         init_std: 2.0
│         dense_act: torch.nn.SiLU
│         mlp_layers: 2
│         layer_norm:
│           cls: sheeprl.models.models.LayerNorm
│           kw:
│             eps: 0.001
│         dense_units: 512
│         clip_gradients: 100.0
│         unimix: 0.01
│         action_clip: 1.0
│         moments:
│           decay: 0.99
│           max: 1.0
│           percentile:
│             low: 0.05
│             high: 0.95
│       critic:
│         optimizer:
│           _target_: torch.optim.Adam
│           lr: 8.0e-05
│           eps: 1.0e-05
│           weight_decay: 0
│           betas:
│           - 0.9
│           - 0.999
│         dense_act: torch.nn.SiLU
│         mlp_layers: 2
│         layer_norm:
│           cls: sheeprl.models.models.LayerNorm
│           kw:
│             eps: 0.001
│         dense_units: 512
│         per_rank_target_network_update_freq: 1
│         tau: 0.02
│         bins: 255
│         clip_gradients: 100.0
│       gamma: 0.996996996996997
│       lmbda: 0.95
│       horizon: 15
│       replay_ratio: 1
│       learning_starts: 1024
│       per_rank_pretrain_steps: 0
│       per_rank_sequence_length: 64
│       cnn_layer_norm:
│         cls: sheeprl.models.models.LayerNormChannelLast
│         kw:
│           eps: 0.001
│       mlp_layer_norm:
│         cls: sheeprl.models.models.LayerNorm
│         kw:
│           eps: 0.001
│       dense_units: 512
│       mlp_layers: 2
│       dense_act: torch.nn.SiLU
│       cnn_act: torch.nn.SiLU
│       unimix: 0.01
│       hafner_initialization: true
│       player:
│         discrete_size: 32
│       
├── buffer
│   └── size: 1000000
│       memmap: true
│       validate_args: false
│       from_numpy: false
│       checkpoint: false
│       
├── checkpoint
│   └── every: 10000
│       resume_from: sheeprl\logs\runs\dreamer_v3\BallGame\2024-05-07_04-45-45_dreamer_v3_BallGame_42\version_0\checkpoint\ckpt_460000_0.ckpt
│       save_last: true
│       keep_last: 5
│       
├── env
│   └── id: PongNoFrameskip-v4
│       num_envs: 4
│       frame_stack: 1
│       sync_env: false
│       screen_size: 64
│       action_repeat: 4
│       grayscale: false
│       clip_rewards: false
│       capture_video: true
│       frame_stack_dilation: 1
│       max_episode_steps: 27000
│       reward_as_observation: false
│       wrapper:
│         _target_: gymnasium.wrappers.AtariPreprocessing
│         env:
│           _target_: gymnasium.make
│           id: PongNoFrameskip-v4
│           render_mode: rgb_array
│         noop_max: 30
│         terminal_on_life_loss: false
│         frame_skip: 4
│         screen_size: 64
│         grayscale_obs: false
│         scale_obs: false
│         grayscale_newaxis: true
│       
├── fabric
│   └── _target_: lightning.fabric.Fabric
│       devices: 1
│       num_nodes: 1
│       strategy: auto
│       accelerator: gpu
│       precision: 32-true
│       callbacks:
│       - _target_: sheeprl.utils.callback.CheckpointCallback
│         keep_last: 5
│       
└── metric
    └── log_every: 5000
        disable_timer: false
        log_level: 1
        sync_on_compute: false
        aggregator:
          _target_: sheeprl.utils.metric.MetricAggregator
          raise_on_missing: false
          metrics:
            Rewards/rew_avg:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Game/ep_len_avg:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/world_model_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/value_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/policy_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/observation_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/reward_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/state_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Loss/continue_loss:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            State/kl:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            State/post_entropy:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            State/prior_entropy:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Grads/world_model:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Grads/actor:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
            Grads/critic:
              _target_: torchmetrics.MeanMetric
              sync_on_compute: false
        logger:
          _target_: lightning.fabric.loggers.TensorBoardLogger
          name: 2024-05-10_21-38-24_dreamer_v3_PongNoFrameskip-v4_42
          root_dir: logs/runs/dreamer_v3/PongNoFrameskip-v4
          version: null
          default_hp_metric: true
          prefix: ''
          sub_dir: null

Error executing job with overrides: ['exp=dreamer_v3']

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
sheeprl.py 4 <module>
run()

main.py 90 decorated_main
_run_hydra(

utils.py 394 _run_hydra
_run_app(

utils.py 457 _run_app
run_and_report(

utils.py 222 run_and_report
raise ex

utils.py 219 run_and_report
return func()

utils.py 458 <lambda>
lambda: hydra.run(

hydra.py 132 run
_ = ret.return_value

utils.py 260 return_value
raise self._return_value

utils.py 186 run_job
ret.return_value = task_function(task_cfg)

cli.py 349 run
cfg = resume_from_checkpoint(cfg)

cli.py 25 resume_from_checkpoint
old_cfg = OmegaConf.load(ckpt_path.parent.parent / "config.yaml")

omegaconf.py 189 load
with io.open(os.path.abspath(file_), "r", encoding="utf-8") as f:

FileNotFoundError:
2
No such file or directory
G:\SheepRL divide by 0 fix\sheeprl\sheeprl\logs\runs\dreamer_v3\BallGame\2024-05-07_04-45-45_dreamer_v3_BallGame_42\version_0\config.yaml
michele-milesi commented 1 month ago

Hi @LucaVendruscolo, I think the problem is the checkpoint.resume_from argument. Since you are in the G:\SheepRL divide by 0 fix\sheeprl folder, the checkpoint path should be: logs\\runs\\... without the `sheeprl' at the beginning.

Let me know if it works, thanks

LucaVendruscolo commented 1 month ago

That was a stupid mistake sorry about that. thank you so much!