NeuralODE raises error with deepcopy

tims457 commented 1 year ago

Describe the bug

I'm trying to implement a neural ode model with the Ray library, but Ray calls deepcopy on its models which is causing an error with custom modules which include NeuralODE.

Step to Reproduce

Minimal example triggering the error.

import copy
import torch.nn as nn

from torchdyn.core import NeuralODE

class NeuralODEModule(nn.Module):
    def __init__(self) -> None:
        super().__init__()

        f = nn.Sequential(
                nn.Linear(8, 8),
                nn.Tanh(),
                nn.Linear(8, 8)
        )
        self.node = NeuralODE(f)

        self.final = nn.Linear(8,1)

    def forward(self, x):
        return self.final(self.node(x))

node = NeuralODEModule()

copy.deepcopy(node)

Error message

  File "/home/tim/anaconda3/lib/python3.9/copy.py", line 153, in deepcopy
    y = copier(memo)
  File "/home/tim/anaconda3/lib/python3.9/site-packages/torch/_tensor.py", line 89, in __deepcopy__
    raise RuntimeError("Only Tensors created explicitly by the user "
RuntimeError: Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

Expected behavior

A copy of the class instance is returned.

Additional context

Section in ray.rllib.policy.torch_policy_v2.py causing the error.

  ids = [id_ for i, id_ in enumerate(gpu_ids) if i < num_gpus]
  self.model_gpu_towers = []
  for i, _ in enumerate(ids):
      model_copy = copy.deepcopy(model)
      self.model_gpu_towers.append(model_copy.to(self.devices[i]))

fedebotu commented 1 year ago

Could you try wrapping your NeuralODEModule instantiation with torch.no_grad() like this?

with torch.no_grad():
    node = NeuralODEModule()

copy.deepcopy(node)

This will destroy gradient information, but since you want to deepcopy the module I guess you want to create a new instance with only the same structure and parameters, similarly to using clone() and detach() with Tensors [reference]

tims457 commented 1 year ago

This appears to be working. Thanks. Can you explain why this is necessary for NeuralODE but not other Torch-only models such as nn.Sequential(nn.Linear(...?

DiffEqML / torchdyn

NeuralODE raises error with deepcopy #174