desy-ml / cheetah

Fast and differentiable particle accelerator optics simulation for reinforcement learning and optimisation applications.
https://cheetah-accelerator.readthedocs.io
GNU General Public License v3.0
27 stars 12 forks source link

Issues when running on a machine with CUDA GPUs #87

Closed jank324 closed 7 months ago

jank324 commented 9 months ago

I've just tried using the new Cheetah version on a cluster node with GPUs and it crashed (dump below). We haven't really tested the scenario of GPUs being present. We absolutely should. I don't know if there is any way we could even integrate this in the GitHub Actions.

wandb: Currently logged in as: jank324 (msk-ipc). Use `wandb login --relogin` to force relogin
wandb: WARNING Path .wandb/wandb/ wasn't writable, using system temp directory.
wandb: WARNING Path .wandb/wandb/ wasn't writable, using system temp directory
wandb: Tracking run with wandb version 0.15.11
wandb: Run data is saved locally in /tmp/wandb/run-20230928_101145-qbbxdq4j
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run mild-microwave-259
wandb: ⭐️ View project at https://wandb.ai/msk-ipc/lcls-fel-rl
wandb: 🚀 View run at https://wandb.ai/msk-ipc/lcls-fel-rl/runs/qbbxdq4j
Process ForkServerProcess-13:
Process ForkServerProcess-1:
Process ForkServerProcess-11:
Process ForkServerProcess-4:
Process ForkServerProcess-12:
Process ForkServerProcess-2:
Process ForkServerProcess-8:
Process ForkServerProcess-14:
Process ForkServerProcess-9:
Process ForkServerProcess-18:
Process ForkServerProcess-6:
Process ForkServerProcess-17:
Process ForkServerProcess-3:
Process ForkServerProcess-7:
Process ForkServerProcess-16:
Process ForkServerProcess-19:
Process ForkServerProcess-5:
Process ForkServerProcess-20:
Process ForkServerProcess-10:
Process ForkServerProcess-15:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 29, in _worker
    env = _patch_env(env_fn_wrapper.var())
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 202, in make_env
    env = lcls_li26hxr.FELIntensityTuning(
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 126, in __init__
    self.backend = CheetahBackend(**backend_args)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/environments/lcls_li26hxr.py", line 530, in __init__
    design_begunh = self._segment.track(design_incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1721, in track
    incoming = todo(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 101, in forward
    return self.track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1709, in track
    return super().track(incoming)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 86, in track
    tm = self.transfer_map(incoming.energy)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 1702, in transfer_map
    tm = torch.matmul(element.transfer_map(energy), tm)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/accelerator.py", line 720, in transfer_map
    return base_rmatrix(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/cheetah/track_methods.py", line 78, in base_rmatrix
    kx = torch.sqrt(torch.complex(kx2, torch.tensor(0.0)))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument imag in method wrapper_CUDA_out_complex_out)
Traceback (most recent call last):
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 247, in <module>
    main()
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 83, in main
    train(config)
  File "/beegfs/desy/user/kaiserja/lcls-fel-tuning/src/train/lcls_li26hxr_ppo.py", line 105, in train
    vec_env = SubprocVecEnv(
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 120, in __init__
    observation_space, action_space = self.remotes[0].recv()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/connection.py", line 250, in recv
    buf = self._recv_bytes()
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/beegfs/desy/user/kaiserja/miniconda3/envs/lcls-fel-tuning/lib/python3.9/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: - 0.005 MB of 0.005 MB uploaded (0.000 MB deduped)
wandb: \ 0.005 MB of 0.023 MB uploaded (0.000 MB deduped)
wandb: | 0.017 MB of 0.023 MB uploaded (0.000 MB deduped)
wandb: / 0.023 MB of 0.023 MB uploaded (0.000 MB deduped)
wandb: 🚀 View run mild-microwave-259 at: https://wandb.ai/msk-ipc/lcls-fel-rl/runs/qbbxdq4j
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: /tmp/wandb/run-20230928_101145-qbbxdq4j/logs
cr-xu commented 9 months ago

This looks like a classical "beam and element on different devices" error.

We can

The public Github-hosted runners don't have GPUs, it seems that this is planned in the future though

jank324 commented 9 months ago

Automatically running GPU nodes would obviously be the coolest, but maybe the pragmatic approach to avoid these problems in the future (for now) would be to have a PR template with tasks and make one of them something like "Run pytest on GPU node just before merge"?

cr-xu commented 9 months ago

... have a PR template with tasks and make one of them something like "Run pytest on GPU node just before merge"

That sounds like a reasonable short-term solution!

jank324 commented 9 months ago

Okay ... I will add it as part of this fix

LYL534 commented 6 months ago

I have the similar problem with you. Do you have any recommendation for me. I deployed my program on the HPC. After 10mins running, it shows the error. I have no idea how to fix it. The problem is as follow.

Process ForkServerProcess-2: Process ForkServerProcess-1: Process ForkServerProcess-3: Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(self._args, self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer ./wandb_server.sh: line 12: 142015 Terminated

jank324 commented 6 months ago

I have the similar problem with you. Do you have any recommendation for me. I deployed my program on the HPC. After 10mins running, it shows the error. I have no idea how to fix it. The problem is as follow.

Process ForkServerProcess-2: Process ForkServerProcess-1: Process ForkServerProcess-3: Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(self._args, self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer ./wandb_server.sh: line 12: 142015 Terminated

Are you using Cheetah? This looks like it's not related to Cheetah, but rather to how you are using Stable Baselines3`s vectorised environments.

LYL534 commented 6 months ago

I have fixed. It is the way I deploy. I use ./wandb.sh. It will lead problem. Use sbatch wandb.sh is ok.

---Original--- From: "Jan @.> Date: Fri, Dec 29, 2023 15:52 PM To: @.>; Cc: @.**@.>; Subject: Re: [desy-ml/cheetah] Issues when running on a machine with CUDA GPUs(Issue #87)

I have the similar problem with you. Do you have any recommendation for me. I deployed my program on the HPC. After 10mins running, it shows the error. I have no idea how to fix it. The problem is as follow.

Process ForkServerProcess-2: Process ForkServerProcess-1: Process ForkServerProcess-3: Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, *self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/process.py", line 108, in run self._target(self._args, self._kwargs) File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/site-packages/stable_baselines3/common/vec_env/subproc_vec_env.py", line 33, in _worker cmd, data = remote.recv() ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 249, in recv buf = self._recv_bytes() ^^^^^^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes buf = self._recv(4) ^^^^^^^^^^^^^ File "/home/ma310272/anaconda3/envs/pybamm_env/lib/python3.11/multiprocessing/connection.py", line 378, in _recv chunk = read(handle, remaining) ^^^^^^^^^^^^^^^^^^^^^^^ ConnectionResetError: [Errno 104] Connection reset by peer ./wandb_server.sh: line 12: 142015 Terminated

Are you using Cheetah? This looks like it's not related to Cheetah, but rather to how you are using Stable Baselines3`s vectorised environments.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>