Error when using a conditional variational autoencoder

Tim-Salzmann / l4casadi

Use PyTorch Models with CasADi for data-driven optimization or learning-based optimal control. Supports Acados.

MIT License

365 stars 27 forks source link

Error when using a conditional variational autoencoder #32

Closed mandralis closed 5 months ago

mandralis commented 5 months ago

Hello !

First of all thank you for the great package.

I would like to bring to your attention an error that I am getting when trying to use l4casadi with a conditional variational autoencoder. The error is the following (when running the forward method of the trained model):

RuntimeError: It appears that you're trying to get value out of a tracing tensor with aten._local_scalar_dense.default - erroring out! It's likely that this is caused by data-dependent control flow or similar. It may be possible to trace this with dynamic shapes; try setting tracing_mode='symbolic' in your make_fx call.

Am I doing something wrong, or will l4casadi fundamentally not work for network architectures like the CVAE which require sampling from a probability distribution.

Best regards, Ioannis

Tim-Salzmann commented 5 months ago

Hi Ioannis,

Thanks for reaching out. Torch does struggle sometimes with tracing specific model structures. Often there is a way around it by slightly adapating the model structure. Please post a minimal non working example.

Thanks Tim

mandralis commented 5 months ago

Hi Tim. Thank you for your response.

Here is a minimal non working example:

import casadi as cs
import torch
import l4casadi as l4c
from typing import Optional

class Normalizer(torch.nn.Module):
    """
    Helper module for normalizing / unnormalzing torch tensors with a given mean and variance.
    """

    def __init__(self, mean: torch.tensor, variance: torch.tensor):
        """
        Initializes Normalizer object.

        Args:
            mean: torch.tensor with shape (n,); mean of normalization.
            variance: torch.tensor, shape (n,); variance of normalization.
        """
        super(Normalizer, self).__init__()

        # Check shapes -- need data to be 1D.
        assert len(mean.shape) == 1, ""
        assert len(variance.shape) == 1

        self.register_buffer("mean", mean)
        self.register_buffer("variance", variance)

    def normalize(self, x: torch.tensor):
        """
        Applies the normalization, i.e., computes (x - mean) / sqrt(variance).
        """
        assert x.shape[-1] == self.mean.shape[0]

        return (x - self.mean) / torch.sqrt(self.variance)

    def unnormalize(
        self,
        mean_normalized: torch.tensor,
        var_normalized: Optional[torch.tensor] = None,
    ):
        """
        Applies the unnormalization to the mean and variance, i.e., computes
        mean_normalized * sqrt(variance) + mean and var_normalized * variance.
        """
        assert mean_normalized.shape[-1] == self.mean.shape[0]
        if var_normalized is not None:
            assert var_normalized.shape[-1] == self.variance.shape[0]

        mean = mean_normalized * torch.sqrt(self.variance) + self.mean
        if var_normalized is not None:
            variance = var_normalized * self.variance
            return mean, variance
        else:
            return mean

class MLP(torch.nn.Module):
    """
    Implements a generic feedforward neural network.
    """

    def __init__(self, input_dim: int, output_dim: int, hidden_dims: list):
        """
        Initializes the MLP.

        Args:
            input_dim: int; dimension of the input.
            output_dim: int; dimension of the output.
            hidden_dims: list of int; dimensions of the hidden layers.
        """
        super(MLP, self).__init__()

        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dims = hidden_dims

        # Create the layers.
        self.layers = torch.nn.ModuleList()
        prev_dim = input_dim
        for dim in hidden_dims:
            self.layers.append(torch.nn.Linear(prev_dim, dim))
            self.layers.append(torch.nn.ReLU())
            prev_dim = dim
        self.layers.append(torch.nn.Linear(prev_dim, output_dim))

    def forward(self, x: torch.tensor):
        """
        Forward pass through the network.
        """
        for layer in self.layers:
            x = layer(x)
        return x

class CVAE(torch.nn.Module):
    """
    Implements a conditional variational autoencoder (CVAE) with
    normalization of the conditioning and output data.
    """

    def __init__(
        self,
        output_dim: int,
        latent_dim: int,
        cond_dim: int,
        encoder_layers: list,
        decoder_layers: list,
        prior_layers: list,
        cond_mean: Optional[torch.tensor] = None,
        cond_var: Optional[torch.tensor] = None,
        output_mean: Optional[torch.tensor] = None,
        output_var: Optional[torch.tensor] = None,
    ):
        """
        Initializes the CVAE.

        Args:
            output_dim: int; dimension of the output data.
            latent_dim: int; dimension of the latent space.
            cond_dim: int; dimension of the conditioning data.
            encoder_layers: list of int; dimensions of the hidden layers of the encoder.
            decoder_layers: list of int; dimensions of the hidden layers of the decoder.
            cond_mean: torch.tensor with shape (cond_dim,); mean of the conditioning data.
            cond_var: torch.tensor with shape (cond_dim,); variance of the conditioning data.
            output_mean: torch.tensor with shape (output_dim,); mean of the output data.
            output_var: torch.tensor with shape (output_dim,); variance of the output data.
        """
        super(CVAE, self).__init__()

        # Create the normalization layers.
        if cond_mean is None:
            cond_mean = torch.zeros(cond_dim)
        if cond_var is None:
            cond_var = torch.ones(cond_dim)
        if output_mean is None:
            output_mean = torch.zeros(output_dim)
        if output_var is None:
            output_var = torch.ones(output_dim)

        self.cond_normalizer = Normalizer(cond_mean, cond_var)
        self.output_normalizer = Normalizer(output_mean, output_var)

        self.output_dim = output_dim
        self.latent_dim = latent_dim
        self.cond_dim = cond_dim

        # Create the encoder and decoder.
        self.encoder = MLP(
            input_dim=cond_dim + output_dim,
            output_dim=2 * latent_dim,
            hidden_dims=encoder_layers,
        )
        self.decoder = MLP(
            input_dim=cond_dim + latent_dim,
            output_dim=2 * output_dim,
            hidden_dims=decoder_layers,
        )
        self.prior_network = MLP(
            input_dim=cond_dim, output_dim=2 * latent_dim, hidden_dims=prior_layers
        )

    def encode(self, x: torch.tensor, cond: torch.tensor):
        """
        Encodes the input data and returns the mean and variance of the latent space.
        """
        input_data = torch.cat([x, cond], dim=-1)
        z_params = self.encoder(input_data)
        z_mean, z_logvar = torch.chunk(z_params, 2, dim=-1)
        z_variance = torch.exp(z_logvar)
        return z_mean, z_variance

    def decode(self, z: torch.tensor, cond: torch.tensor, unnormalize: bool):
        """
        Decodes the latent space and returns the output data.
        """
        input_data = torch.cat([z, cond], dim=-1)
        output_params = self.decoder(input_data)
        output_mean_normalized, output_logvar_normalized = torch.chunk(
            output_params, 2, dim=-1
        )
        output_variance_normalized = torch.exp(output_logvar_normalized)

        if unnormalize:
            output_mean, output_variance = self.output_normalizer.unnormalize(
                output_mean_normalized, output_variance_normalized
            )
        else:
            output_mean = output_mean_normalized
            output_variance = output_variance_normalized

        return output_mean, output_variance

    def prior(self, cond: torch.tensor):
        """
        Returns the mean and variance of the prior distribution.
        """
        prior_params = self.prior_network(cond)
        prior_mean, prior_logvar = torch.chunk(prior_params, 2, dim=-1)
        prior_variance = torch.exp(prior_logvar)
        # prior_mean = torch.zeros_like(prior_mean)
        # prior_variance = torch.ones_like(prior_variance)

        return prior_mean, prior_variance

    def forward(self, cond: torch.tensor):
        num_samples = 100
        with torch.no_grad():
            cond = self.cond_normalizer.normalize(cond)

            # Sample the prior.
            prior_mean, prior_var = self.prior(cond)
            prior_dist = torch.distributions.MultivariateNormal(
                loc=prior_mean, covariance_matrix=torch.diag_embed(prior_var)
            )

            z = prior_dist.sample((num_samples,))

            cond_expanded = cond.unsqueeze(0).expand(num_samples, -1, -1)

            # Decode the samples
            pred_mean_expanded, pred_var_expanded = self.decode(
                z, cond_expanded, unnormalize=True
            )

            pred_mean = pred_mean_expanded.mean(dim=0)
            pred_var_expanded = torch.diag_embed(pred_var_expanded)

            pred_var = torch.mean(
                pred_var_expanded
                + pred_mean_expanded.unsqueeze(-1) * pred_mean_expanded.unsqueeze(-2),
                dim=0,
            )

            pred_var = pred_var - pred_mean.unsqueeze(-1) @ pred_mean.unsqueeze(-2)

            return pred_mean, pred_var

pyTorch_model = CVAE(output_dim=1,
                     latent_dim=1,
                     cond_dim=2,
                     encoder_layers=[16,16],
                     decoder_layers=[16,16],
                     prior_layers  =[16,16],
                    )
l4c_model = l4c.L4CasADi(pyTorch_model, model_expects_batch_dim=True, device='cpu')  # device='cuda' for GPU

x_sym = cs.MX.sym('x', 2, 1)
y_sym = l4c_model(x_sym)
f = cs.Function('y', [x_sym], [y_sym])
df = cs.Function('dy', [x_sym], [cs.jacobian(y_sym, x_sym)])
ddf = cs.Function('ddy', [x_sym], [cs.hessian(y_sym, x_sym)[0]])

x = cs.DM([[0.], [2.]])
print(l4c_model(x))
print(f(x))
print(df(x))
print(ddf(x))

mandralis commented 5 months ago

Hi Tim,

Have you had a chance to check out the non-working example and any ideas on how I can make it work ?

Best, Ioannis

Tim-Salzmann commented 5 months ago

Hi Ioannis,

I got it working in the sense that no error is throw. However, I do not have the time to test ist functionally (e.g. if the correct operations are actually computed...). I would appreciate if you could give this a try and give me feedback if this works as expected:

Please checkout the branch no_data_dependent_error.

Here is the working example:

import casadi as cs
import torch
import l4casadi as l4c
from typing import Optional

class Normalizer(torch.nn.Module):
    """
    Helper module for normalizing / unnormalzing torch tensors with a given mean and variance.
    """

    def __init__(self, mean: torch.tensor, variance: torch.tensor):
        """
        Initializes Normalizer object.

        Args:
            mean: torch.tensor with shape (n,); mean of normalization.
            variance: torch.tensor, shape (n,); variance of normalization.
        """
        super(Normalizer, self).__init__()

        # Check shapes -- need data to be 1D.
        assert len(mean.shape) == 1, ""
        assert len(variance.shape) == 1

        self.register_buffer("mean", mean)
        self.register_buffer("variance", variance)

    def normalize(self, x: torch.tensor):
        """
        Applies the normalization, i.e., computes (x - mean) / sqrt(variance).
        """
        assert x.shape[-1] == self.mean.shape[0]

        return (x - self.mean) / torch.sqrt(self.variance)

    def unnormalize(
        self,
        mean_normalized: torch.tensor,
        var_normalized: Optional[torch.tensor] = None,
    ):
        """
        Applies the unnormalization to the mean and variance, i.e., computes
        mean_normalized * sqrt(variance) + mean and var_normalized * variance.
        """
        assert mean_normalized.shape[-1] == self.mean.shape[0]
        if var_normalized is not None:
            assert var_normalized.shape[-1] == self.variance.shape[0]

        mean = mean_normalized * torch.sqrt(self.variance) + self.mean
        if var_normalized is not None:
            variance = var_normalized * self.variance
            return mean, variance
        else:
            return mean

class MLP(torch.nn.Module):
    """
    Implements a generic feedforward neural network.
    """

    def __init__(self, input_dim: int, output_dim: int, hidden_dims: list):
        """
        Initializes the MLP.

        Args:
            input_dim: int; dimension of the input.
            output_dim: int; dimension of the output.
            hidden_dims: list of int; dimensions of the hidden layers.
        """
        super(MLP, self).__init__()

        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dims = hidden_dims

        # Create the layers.
        self.layers = torch.nn.ModuleList()
        prev_dim = input_dim
        for dim in hidden_dims:
            self.layers.append(torch.nn.Linear(prev_dim, dim))
            self.layers.append(torch.nn.ReLU())
            prev_dim = dim
        self.layers.append(torch.nn.Linear(prev_dim, output_dim))

    def forward(self, x: torch.tensor):
        """
        Forward pass through the network.
        """
        for layer in self.layers:
            x = layer(x)
        return x

class CVAE(torch.nn.Module):
    """
    Implements a conditional variational autoencoder (CVAE) with
    normalization of the conditioning and output data.
    """

    def __init__(
        self,
        output_dim: int,
        latent_dim: int,
        cond_dim: int,
        encoder_layers: list,
        decoder_layers: list,
        prior_layers: list,
        cond_mean: Optional[torch.tensor] = None,
        cond_var: Optional[torch.tensor] = None,
        output_mean: Optional[torch.tensor] = None,
        output_var: Optional[torch.tensor] = None,
    ):
        """
        Initializes the CVAE.

        Args:
            output_dim: int; dimension of the output data.
            latent_dim: int; dimension of the latent space.
            cond_dim: int; dimension of the conditioning data.
            encoder_layers: list of int; dimensions of the hidden layers of the encoder.
            decoder_layers: list of int; dimensions of the hidden layers of the decoder.
            cond_mean: torch.tensor with shape (cond_dim,); mean of the conditioning data.
            cond_var: torch.tensor with shape (cond_dim,); variance of the conditioning data.
            output_mean: torch.tensor with shape (output_dim,); mean of the output data.
            output_var: torch.tensor with shape (output_dim,); variance of the output data.
        """
        super(CVAE, self).__init__()

        # Create the normalization layers.
        if cond_mean is None:
            cond_mean = torch.zeros(cond_dim)
        if cond_var is None:
            cond_var = torch.ones(cond_dim)
        if output_mean is None:
            output_mean = torch.zeros(output_dim)
        if output_var is None:
            output_var = torch.ones(output_dim)

        self.cond_normalizer = Normalizer(cond_mean, cond_var)
        self.output_normalizer = Normalizer(output_mean, output_var)

        self.output_dim = output_dim
        self.latent_dim = latent_dim
        self.cond_dim = cond_dim

        # Create the encoder and decoder.
        self.encoder = MLP(
            input_dim=cond_dim + output_dim,
            output_dim=2 * latent_dim,
            hidden_dims=encoder_layers,
        )
        self.decoder = MLP(
            input_dim=cond_dim + latent_dim,
            output_dim=2 * output_dim,
            hidden_dims=decoder_layers,
        )
        self.prior_network = MLP(
            input_dim=cond_dim, output_dim=2 * latent_dim, hidden_dims=prior_layers
        )

        self.samples = torch.randn((100, 1, 1))

    def encode(self, x: torch.tensor, cond: torch.tensor):
        """
        Encodes the input data and returns the mean and variance of the latent space.
        """
        input_data = torch.cat([x, cond], dim=-1)
        z_params = self.encoder(input_data)
        z_mean, z_logvar = torch.chunk(z_params, 2, dim=-1)
        z_variance = torch.exp(z_logvar)
        return z_mean, z_variance

    def decode(self, z: torch.tensor, cond: torch.tensor, unnormalize: bool):
        """
        Decodes the latent space and returns the output data.
        """
        input_data = torch.cat([z, cond], dim=-1)
        output_params = self.decoder(input_data)
        output_mean_normalized, output_logvar_normalized = torch.chunk(
            output_params, 2, dim=-1
        )
        output_variance_normalized = torch.exp(output_logvar_normalized)

        if unnormalize:
            output_mean, output_variance = self.output_normalizer.unnormalize(
                output_mean_normalized, output_variance_normalized
            )
        else:
            output_mean = output_mean_normalized
            output_variance = output_variance_normalized

        return output_mean, output_variance

    def prior(self, cond: torch.tensor):
        """
        Returns the mean and variance of the prior distribution.
        """
        prior_params = self.prior_network(cond)
        prior_mean, prior_logvar = torch.chunk(prior_params, 2, dim=-1)
        prior_variance = torch.exp(prior_logvar)
        # prior_mean = torch.zeros_like(prior_mean)
        # prior_variance = torch.ones_like(prior_variance)

        return prior_mean, prior_variance

    def forward(self, cond: torch.tensor):
        num_samples = 100
        cond = self.cond_normalizer.normalize(cond)

        # Sample the prior.
        prior_mean, prior_var = self.prior(cond)

        prior_dist = torch.distributions.MultivariateNormal(
            loc=prior_mean, covariance_matrix=torch.diag_embed(prior_var)
        )

        z = prior_dist.rsample((num_samples,))
        #z = torch.ones((num_samples, 1, 1))

        cond_expanded = cond.unsqueeze(0).expand(num_samples, -1, -1)

        # Decode the samples
        pred_mean_expanded, pred_var_expanded = self.decode(
            z, cond_expanded, unnormalize=True
        )

        pred_mean = pred_mean_expanded.mean(dim=0)
        pred_var_expanded = torch.diag_embed(pred_var_expanded)

        pred_var = torch.mean(
            pred_var_expanded
            + pred_mean_expanded.unsqueeze(-1) * pred_mean_expanded.unsqueeze(-2),
            dim=0,
        )

        pred_var = pred_var - pred_mean.unsqueeze(-1) @ pred_mean.unsqueeze(-2)

        return pred_mean #, pred_var #L4CasADi expects a single output

pyTorch_model = CVAE(output_dim=1,
                     latent_dim=1,
                     cond_dim=2,
                     encoder_layers=[16,16],
                     decoder_layers=[16,16],
                     prior_layers  =[16,16],
                    )
l4c_model = l4c.L4CasADi(pyTorch_model, model_expects_batch_dim=True, device='cpu')  # device='cuda' for GPU

x_sym = cs.MX.sym('x', 2, 1)
y_sym = l4c_model(x_sym)
f = cs.Function('y', [x_sym], [y_sym])
df = cs.Function('dy', [x_sym], [cs.jacobian(y_sym, x_sym)])
ddf = cs.Function('ddy', [x_sym], [cs.hessian(y_sym, x_sym)[0]])

x = cs.DM([[0.], [2.]])
print(l4c_model(x))
print(f(x))
print(df(x))
print(ddf(x))

mandralis commented 5 months ago

Hi Tim. I reinstalled l4casadi using the new no_data_dependent_error branch. However the code you revised still fails since it cannot compute the jacobian or hessian, even though it can compute the forward pass through the network. The error I am getting is the following. Do you also have this problem ?

/home/m4pc/.local/lib/python3.8/site-packages/torch/jit/_check.py:177: UserWarning: The TorchScript type system doesn't support instance-level annotations on empty non-base types in `__init__`. Instead, either 1) use a type annotation in the class body, or 2) wrap the type in `torch.jit.Attribute`.
  warnings.warn(
Jacobian trace could not be generated. First-order sensitivities will not be available in CasADi.
Hessian trace could not be generated. Second-order sensitivities will not be available in CasADi.
0.367194
0.367194
Function jac_l4casadi_f (0x83a95c0)
Input 0 (i0): [0, 2]
Input 1 (out_o0): 00
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/m4pc/m4v2-code/m4_ws/src/morphing_lander/morphing_lander/mpc/learned_dynamics_new.py", line 259, in <module>
    print(df(x))
  File "/home/m4pc/.local/lib/python3.8/site-packages/casadi/casadi.py", line 23381, in __call__
    ret = self.call(args)
  File "/home/m4pc/.local/lib/python3.8/site-packages/casadi/casadi.py", line 20039, in call
    return _casadi.Function_call(self, *args)
RuntimeError: Error in Function::call for 'dy' [MXFunction] at .../casadi/core/function.cpp:330:
Error in Function::operator() for 'jac_l4casadi_f' [External] at .../casadi/core/function.cpp:1482:
_ivalue_ INTERNAL ASSERT FAILED at "../torch/csrc/jit/api/object.h":38, please report a bug to PyTorch. 
Exception raised from _ivalue at ../torch/csrc/jit/api/object.h:38 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f52e2d6fd87 in /home/m4pc/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, char const*) + 0x68 (0x7f52e2d20828 in /home/m4pc/.local/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: torch::jit::Object::find_method(std::string const&) const + 0x387 (0x7f52b277f9b7 in /home/m4pc/.local/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame #3: torch::jit::Module::forward(std::vector<c10::IValue, std::allocator<c10::IValue> >, std::unordered_map<std::string, c10::IValue, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, c10::IValue> > > const&) + 0x65 (0x7f52dadc3dd5 in /home/m4pc/.local/lib/python3.8/site-packages/l4casadi/lib/libl4casadi.so)
frame #4: L4CasADi::L4CasADiImpl::jac(at::Tensor) + 0x356 (0x7f52dadc61e6 in /home/m4pc/.local/lib/python3.8/site-packages/l4casadi/lib/libl4casadi.so)
frame #5: L4CasADi::jac(double const*, int, int, double*) + 0x12f (0x7f52dadc02df in /home/m4pc/.local/lib/python3.8/site-packages/l4casadi/lib/libl4casadi.so)
frame #6: jac_l4casadi_f + 0x4d (0x7f52e1da531a in _l4c_generated/libl4casadi_f.so)
frame #7: casadi::FunctionInternal::eval_gen(double const**, double**, long long*, double*, void*) const + 0x191 (0x7f52c66cfce1 in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #8: casadi::Function::operator()(double const**, double**, long long*, double*, int) const + 0x4e (0x7f52c668e6ae in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #9: casadi::Function::operator()(double const**, double**, long long*, double*) const + 0x3c (0x7f52c668e70c in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #10: casadi::MXFunction::eval(double const**, double**, long long*, double*, void*) const + 0x329 (0x7f52c673c0c9 in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #11: casadi::FunctionInternal::eval_gen(double const**, double**, long long*, double*, void*) const + 0x35e (0x7f52c66cfeae in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #12: <unknown function> + 0x4b80a9 (0x7f52c66940a9 in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #13: casadi::Function::call(std::vector<casadi::Matrix<double>, std::allocator<casadi::Matrix<double> > > const&, std::vector<casadi::Matrix<double>, std::allocator<casadi::Matrix<double> > >&, bool, bool) const + 0x30 (0x7f52c6694f80 in /home/m4pc/.local/lib/python3.8/site-packages/casadi/libcasadi.so.3.7)
frame #14: <unknown function> + 0x2a08fb (0x7f52c6d868fb in /home/m4pc/.local/lib/python3.8/site-packages/casadi/_casadi.so)
frame #15: <unknown function> + 0x2fcce9 (0x7f52c6de2ce9 in /home/m4pc/.local/lib/python3.8/site-packages/casadi/_casadi.so)
<omitting python frames>
frame #19: python3() [0x4e1bd0]
frame #23: python3() [0x57a42e]
frame #28: python3() [0x5e1514]
frame #29: python3() [0x5a27d0]
frame #37: python3() [0x6b3dd2]
frame #40: __libc_start_main + 0xf3 (0x7f52e6bec083 in /lib/x86_64-linux-gnu/libc.so.6)

Thanks again for the help ! Much appreciated.

Tim-Salzmann commented 5 months ago

This is odd! The code I sent you works for me. Including Jacobians and Hessian. Could you try and upgrade to torch==2.3.0 if you have not already?

mandralis commented 5 months ago

This works when I upgrade to torch==2.3.0 ! Thanks for the help. I will be testing whether this is functionally the same thing and letting you know before I close the issue.

Best, Ioannis

mandralis commented 5 months ago

When testing the no_data_dependent_error branch of l4casadi with torch==2.3.0 I found that the conditional variational auto encoder works in its original implementation (the one I posted first). It seems that the changes you made totally solved the problem. I will be testing this network in conjuction with acados next.

Best regards, Ioannis

Tim-Salzmann commented 5 months ago

I just updated the 'main' branch to reflect these changes. The L4CasADi class has an extra parameter _error_on_data_dependent_opsnow which you would have to set to False.

mandralis commented 5 months ago

Hi Tim,

This fix was working great until I tried using the CVAE model with the RealTimeL4CasADi class. Now the model fails again with the same data dependent control flow error. Here is a minimal non-working example. Do you think you could have a look ?

import casadi as cs
import torch
import l4casadi as l4c
from typing import Optional
import numpy as np 

class Normalizer(torch.nn.Module):
    """
    Helper module for normalizing / unnormalzing torch tensors with a given mean and variance.
    """

    def __init__(self, mean: torch.tensor, variance: torch.tensor):
        """
        Initializes Normalizer object.

        Args:
            mean: torch.tensor with shape (n,); mean of normalization.
            variance: torch.tensor, shape (n,); variance of normalization.
        """
        super(Normalizer, self).__init__()

        # Check shapes -- need data to be 1D.
        assert len(mean.shape) == 1, ""
        assert len(variance.shape) == 1

        self.register_buffer("mean", mean)
        self.register_buffer("variance", variance)

    def normalize(self, x: torch.tensor):
        """
        Applies the normalization, i.e., computes (x - mean) / sqrt(variance).
        """
        assert x.shape[-1] == self.mean.shape[0]

        return (x - self.mean) / torch.sqrt(self.variance)

    def unnormalize(
        self,
        mean_normalized: torch.tensor,
        var_normalized: Optional[torch.tensor] = None,
    ):
        """
        Applies the unnormalization to the mean and variance, i.e., computes
        mean_normalized * sqrt(variance) + mean and var_normalized * variance.
        """
        assert mean_normalized.shape[-1] == self.mean.shape[0]
        if var_normalized is not None:
            assert var_normalized.shape[-1] == self.variance.shape[0]

        mean = mean_normalized * torch.sqrt(self.variance) + self.mean
        if var_normalized is not None:
            variance = var_normalized * self.variance
            return mean, variance
        else:
            return mean

class MLP(torch.nn.Module):
    """
    Implements a generic feedforward neural network.
    """

    def __init__(self, input_dim: int, output_dim: int, hidden_dims: list):
        """
        Initializes the MLP.

        Args:
            input_dim: int; dimension of the input.
            output_dim: int; dimension of the output.
            hidden_dims: list of int; dimensions of the hidden layers.
        """
        super(MLP, self).__init__()

        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dims = hidden_dims

        # Create the layers.
        self.layers = torch.nn.ModuleList()
        prev_dim = input_dim
        for dim in hidden_dims:
            self.layers.append(torch.nn.Linear(prev_dim, dim))
            self.layers.append(torch.nn.ReLU())
            prev_dim = dim
        self.layers.append(torch.nn.Linear(prev_dim, output_dim))

    def forward(self, x: torch.tensor):
        """
        Forward pass through the network.
        """
        for layer in self.layers:
            x = layer(x)
        return x

class CVAE(torch.nn.Module):
    """
    Implements a conditional variational autoencoder (CVAE) with
    normalization of the conditioning and output data.
    """

    def __init__(
        self,
        output_dim: int,
        latent_dim: int,
        cond_dim: int,
        encoder_layers: list,
        decoder_layers: list,
        prior_layers: list,
        cond_mean: Optional[torch.tensor] = None,
        cond_var: Optional[torch.tensor] = None,
        output_mean: Optional[torch.tensor] = None,
        output_var: Optional[torch.tensor] = None,
    ):
        """
        Initializes the CVAE.

        Args:
            output_dim: int; dimension of the output data.
            latent_dim: int; dimension of the latent space.
            cond_dim: int; dimension of the conditioning data.
            encoder_layers: list of int; dimensions of the hidden layers of the encoder.
            decoder_layers: list of int; dimensions of the hidden layers of the decoder.
            cond_mean: torch.tensor with shape (cond_dim,); mean of the conditioning data.
            cond_var: torch.tensor with shape (cond_dim,); variance of the conditioning data.
            output_mean: torch.tensor with shape (output_dim,); mean of the output data.
            output_var: torch.tensor with shape (output_dim,); variance of the output data.
        """
        super(CVAE, self).__init__()

        # Create the normalization layers.
        if cond_mean is None:
            cond_mean = torch.zeros(cond_dim)
        if cond_var is None:
            cond_var = torch.ones(cond_dim)
        if output_mean is None:
            output_mean = torch.zeros(output_dim)
        if output_var is None:
            output_var = torch.ones(output_dim)

        self.cond_normalizer = Normalizer(cond_mean, cond_var)
        self.output_normalizer = Normalizer(output_mean, output_var)

        self.output_dim = output_dim
        self.latent_dim = latent_dim
        self.cond_dim = cond_dim

        # Create the encoder and decoder.
        self.encoder = MLP(
            input_dim=cond_dim + output_dim,
            output_dim=2 * latent_dim,
            hidden_dims=encoder_layers,
        )
        self.decoder = MLP(
            input_dim=cond_dim + latent_dim,
            output_dim=2 * output_dim,
            hidden_dims=decoder_layers,
        )
        self.prior_network = MLP(
            input_dim=cond_dim, output_dim=2 * latent_dim, hidden_dims=prior_layers
        )

        self.samples = torch.randn((100, 1, 1))

    def encode(self, x: torch.tensor, cond: torch.tensor):
        """
        Encodes the input data and returns the mean and variance of the latent space.
        """
        input_data = torch.cat([x, cond], dim=-1)
        z_params = self.encoder(input_data)
        z_mean, z_logvar = torch.chunk(z_params, 2, dim=-1)
        z_variance = torch.exp(z_logvar)
        return z_mean, z_variance

    def decode(self, z: torch.tensor, cond: torch.tensor, unnormalize: bool):
        """
        Decodes the latent space and returns the output data.
        """
        input_data = torch.cat([z, cond], dim=-1)
        output_params = self.decoder(input_data)
        output_mean_normalized, output_logvar_normalized = torch.chunk(
            output_params, 2, dim=-1
        )
        output_variance_normalized = torch.exp(output_logvar_normalized)

        if unnormalize:
            output_mean, output_variance = self.output_normalizer.unnormalize(
                output_mean_normalized, output_variance_normalized
            )
        else:
            output_mean = output_mean_normalized
            output_variance = output_variance_normalized

        return output_mean, output_variance

    def prior(self, cond: torch.tensor):
        """
        Returns the mean and variance of the prior distribution.
        """
        prior_params = self.prior_network(cond)
        prior_mean, prior_logvar = torch.chunk(prior_params, 2, dim=-1)
        prior_variance = torch.exp(prior_logvar)
        # prior_mean = torch.zeros_like(prior_mean)
        # prior_variance = torch.ones_like(prior_variance)

        return prior_mean, prior_variance

    def forward(self, cond: torch.tensor):
        num_samples = 100
        cond = self.cond_normalizer.normalize(cond)

        # Sample the prior.
        prior_mean, prior_var = self.prior(cond)

        prior_dist = torch.distributions.MultivariateNormal(
            loc=prior_mean, covariance_matrix=torch.diag_embed(prior_var)
        )

        z = prior_dist.rsample((num_samples,))
        #z = torch.ones((num_samples, 1, 1))

        cond_expanded = cond.unsqueeze(0).expand(num_samples, -1, -1)

        # Decode the samples
        pred_mean_expanded, pred_var_expanded = self.decode(
            z, cond_expanded, unnormalize=True
        )

        pred_mean = pred_mean_expanded.mean(dim=0)
        pred_var_expanded = torch.diag_embed(pred_var_expanded)

        pred_var = torch.mean(
            pred_var_expanded
            + pred_mean_expanded.unsqueeze(-1) * pred_mean_expanded.unsqueeze(-2),
            dim=0,
        )

        pred_var = pred_var - pred_mean.unsqueeze(-1) @ pred_mean.unsqueeze(-2)

        return pred_mean #, pred_var #L4CasADi expects a single output

model = CVAE(output_dim=2,
                     latent_dim=1,
                     cond_dim=2,
                     encoder_layers=[16,16],
                     decoder_layers=[16,16],
                     prior_layers  =[16,16],
                    )
l4c_model = l4c.realtime.RealTimeL4CasADi(model, approximation_order=1)  # device='cuda' for GPU

in_sym = cs.MX.sym('in_sym',2,1)
y_sym = l4c_model(in_sym) # call model once before getting parameters
casadi_func = cs.Function('model_rt_approx',
                        [in_sym, l4c_model.get_sym_params()],
                        [y_sym])

x = np.ones([1, 2])  # torch needs batch dimension
casadi_param = l4c_model.get_params(x)
casadi_out = casadi_func(x, casadi_param)  # transpose for vector rep. expected by casadi

t_out = model(torch.tensor(x, dtype=torch.float32))

print(casadi_out)
print(t_out)

Tim-Salzmann commented 5 months ago

Hi,

I found the underlaying problem for the data dependend error. Please pass validate_args=False to any distribution you use as in

torch.distributions.MultivariateNormal(
            loc=prior_mean, covariance_matrix=torch.diag_embed(prior_var), validate_args=False
        )

This will make the use of _error_on_data_dependent_ops as param to L4CasADi unnecessary.

I further added a change to RealTimeL4CasADi. I can now run the following code:

import casadi as cs
import torch
import l4casadi as l4c
from typing import Optional
import numpy as np

class Normalizer(torch.nn.Module):
    """
    Helper module for normalizing / unnormalzing torch tensors with a given mean and variance.
    """

    def __init__(self, mean: torch.tensor, variance: torch.tensor):
        """
        Initializes Normalizer object.

        Args:
            mean: torch.tensor with shape (n,); mean of normalization.
            variance: torch.tensor, shape (n,); variance of normalization.
        """
        super(Normalizer, self).__init__()

        # Check shapes -- need data to be 1D.
        assert len(mean.shape) == 1, ""
        assert len(variance.shape) == 1

        self.register_buffer("mean", mean)
        self.register_buffer("variance", variance)

    def normalize(self, x: torch.tensor):
        """
        Applies the normalization, i.e., computes (x - mean) / sqrt(variance).
        """
        assert x.shape[-1] == self.mean.shape[0]

        return (x - self.mean) / torch.sqrt(self.variance)

    def unnormalize(
        self,
        mean_normalized: torch.tensor,
        var_normalized: Optional[torch.tensor] = None,
    ):
        """
        Applies the unnormalization to the mean and variance, i.e., computes
        mean_normalized * sqrt(variance) + mean and var_normalized * variance.
        """
        assert mean_normalized.shape[-1] == self.mean.shape[0]
        if var_normalized is not None:
            assert var_normalized.shape[-1] == self.variance.shape[0]

        mean = mean_normalized * torch.sqrt(self.variance) + self.mean
        if var_normalized is not None:
            variance = var_normalized * self.variance
            return mean, variance
        else:
            return mean

class MLP(torch.nn.Module):
    """
    Implements a generic feedforward neural network.
    """

    def __init__(self, input_dim: int, output_dim: int, hidden_dims: list):
        """
        Initializes the MLP.

        Args:
            input_dim: int; dimension of the input.
            output_dim: int; dimension of the output.
            hidden_dims: list of int; dimensions of the hidden layers.
        """
        super(MLP, self).__init__()

        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dims = hidden_dims

        # Create the layers.
        self.layers = torch.nn.ModuleList()
        prev_dim = input_dim
        for dim in hidden_dims:
            self.layers.append(torch.nn.Linear(prev_dim, dim))
            self.layers.append(torch.nn.ReLU())
            prev_dim = dim
        self.layers.append(torch.nn.Linear(prev_dim, output_dim))

    def forward(self, x: torch.tensor):
        """
        Forward pass through the network.
        """
        for layer in self.layers:
            x = layer(x)
        return x

class CVAE(torch.nn.Module):
    """
    Implements a conditional variational autoencoder (CVAE) with
    normalization of the conditioning and output data.
    """

    def __init__(
        self,
        output_dim: int,
        latent_dim: int,
        cond_dim: int,
        encoder_layers: list,
        decoder_layers: list,
        prior_layers: list,
        cond_mean: Optional[torch.tensor] = None,
        cond_var: Optional[torch.tensor] = None,
        output_mean: Optional[torch.tensor] = None,
        output_var: Optional[torch.tensor] = None,
    ):
        """
        Initializes the CVAE.

        Args:
            output_dim: int; dimension of the output data.
            latent_dim: int; dimension of the latent space.
            cond_dim: int; dimension of the conditioning data.
            encoder_layers: list of int; dimensions of the hidden layers of the encoder.
            decoder_layers: list of int; dimensions of the hidden layers of the decoder.
            cond_mean: torch.tensor with shape (cond_dim,); mean of the conditioning data.
            cond_var: torch.tensor with shape (cond_dim,); variance of the conditioning data.
            output_mean: torch.tensor with shape (output_dim,); mean of the output data.
            output_var: torch.tensor with shape (output_dim,); variance of the output data.
        """
        super(CVAE, self).__init__()

        # Create the normalization layers.
        if cond_mean is None:
            cond_mean = torch.zeros(cond_dim)
        if cond_var is None:
            cond_var = torch.ones(cond_dim)
        if output_mean is None:
            output_mean = torch.zeros(output_dim)
        if output_var is None:
            output_var = torch.ones(output_dim)

        self.cond_normalizer = Normalizer(cond_mean, cond_var)
        self.output_normalizer = Normalizer(output_mean, output_var)

        self.output_dim = output_dim
        self.latent_dim = latent_dim
        self.cond_dim = cond_dim

        # Create the encoder and decoder.
        self.encoder = MLP(
            input_dim=cond_dim + output_dim,
            output_dim=2 * latent_dim,
            hidden_dims=encoder_layers,
        )
        self.decoder = MLP(
            input_dim=cond_dim + latent_dim,
            output_dim=2 * output_dim,
            hidden_dims=decoder_layers,
        )
        self.prior_network = MLP(
            input_dim=cond_dim, output_dim=2 * latent_dim, hidden_dims=prior_layers
        )

        self.samples = torch.randn((100, 1, 1))

    def encode(self, x: torch.tensor, cond: torch.tensor):
        """
        Encodes the input data and returns the mean and variance of the latent space.
        """
        input_data = torch.cat([x, cond], dim=-1)
        z_params = self.encoder(input_data)
        z_mean, z_logvar = torch.chunk(z_params, 2, dim=-1)
        z_variance = torch.exp(z_logvar)
        return z_mean, z_variance

    def decode(self, z: torch.tensor, cond: torch.tensor, unnormalize: bool):
        """
        Decodes the latent space and returns the output data.
        """
        input_data = torch.cat([z, cond], dim=-1)
        output_params = self.decoder(input_data)
        output_mean_normalized, output_logvar_normalized = torch.chunk(
            output_params, 2, dim=-1
        )
        output_variance_normalized = torch.exp(output_logvar_normalized)

        if unnormalize:
            output_mean, output_variance = self.output_normalizer.unnormalize(
                output_mean_normalized, output_variance_normalized
            )
        else:
            output_mean = output_mean_normalized
            output_variance = output_variance_normalized

        return output_mean, output_variance

    def prior(self, cond: torch.tensor):
        """
        Returns the mean and variance of the prior distribution.
        """
        prior_params = self.prior_network(cond)
        prior_mean, prior_logvar = torch.chunk(prior_params, 2, dim=-1)
        prior_variance = torch.exp(prior_logvar)
        # prior_mean = torch.zeros_like(prior_mean)
        # prior_variance = torch.ones_like(prior_variance)

        return prior_mean, prior_variance

    def forward(self, cond: torch.tensor):
        num_samples = 100
        cond = self.cond_normalizer.normalize(cond)

        # Sample the prior.
        prior_mean, prior_var = self.prior(cond)

        prior_dist = torch.distributions.MultivariateNormal(
            loc=prior_mean, covariance_matrix=torch.diag_embed(prior_var), validate_args=False
        )

        z = prior_dist.rsample((num_samples,))
        #z = torch.ones((num_samples, 1, 1))

        cond_expanded = cond.unsqueeze(0).expand(num_samples, -1, -1)

        # Decode the samples
        pred_mean_expanded, pred_var_expanded = self.decode(
            z, cond_expanded, unnormalize=True
        )

        pred_mean = pred_mean_expanded.mean(dim=0)
        pred_var_expanded = torch.diag_embed(pred_var_expanded)

        pred_var = torch.mean(
            pred_var_expanded
            + pred_mean_expanded.unsqueeze(-1) * pred_mean_expanded.unsqueeze(-2),
            dim=0,
        )

        pred_var = pred_var - pred_mean.unsqueeze(-1) @ pred_mean.unsqueeze(-2)

        return pred_mean #, pred_var #L4CasADi expects a single output

model = CVAE(output_dim=2,
                     latent_dim=1,
                     cond_dim=2,
                     encoder_layers=[16,16],
                     decoder_layers=[16,16],
                     prior_layers  =[16,16],
                    )
l4c_model = l4c.realtime.RealTimeL4CasADi(model, approximation_order=1)  # device='cuda' for GPU

in_sym = cs.MX.sym('in_sym',2,1)
y_sym = l4c_model(in_sym) # call model once before getting parameters
casadi_func = cs.Function('model_rt_approx',
                        [in_sym, l4c_model.get_sym_params()],
                        [y_sym])

x = np.ones([1, 2])  # torch needs batch dimension
casadi_param = l4c_model.get_params(x)
casadi_out = casadi_func(x, casadi_param)  # transpose for vector rep. expected by casadi

t_out = model(torch.tensor(x, dtype=torch.float32))

print(casadi_out)
print(t_out)

mandralis commented 5 months ago

Hi Tim, thanks again for your support. This now works, but if you try to run a batch through the model a new error comes up i.e. if you replace (in the final lines) x = np.ones([1, 2]) with x = np.ones([10, 2]) the following error comes up: RuntimeError: vmap: Cannot ask for different inplace randomness on an unbatched tensor. This will appear like same randomness. If this is necessary for your usage, please file an issue with functorch.

I think this again has to do with the sampling: any idea what might be causing it ?

EDIT: this link (section on Randomness: https://pytorch.org/functorch/stable/ux_limitations.html) might help, but I don't know where vmap is being called

Best, Ioannis

Tim-Salzmann commented 5 months ago

Hi Ioannis,

It looks to me like this is a problem with functorch itself [1]. I changed the vmap randomess to same. While this will enable your code to run it is important to understand the consequences: The random generated values will be the same for each batch element. Now, I do think this is fine for your use case as they will be transformed depending on the batch input and averaged.

Best Tim

[1] https://github.com/pytorch/functorch/issues/996

mandralis commented 5 months ago

I am not sure if having the same randomness for every element of the batch gives the desired functionality for the CVAE. In any case you are right that this is an issue with functorch. I might open a feature request there.

Thanks for the help.

Tim-Salzmann commented 5 months ago

Hi,

I am not sure if having the same randomness for every element of the batch gives the desired functionality for the CVAE.

Could you elaborate on why you think this is a problem in your use case? Given enough samples (100 in your case), a fixed seed randomness (what we now effectively have) should not massively affect the expectation (averaging) of the output.

To be clear: the random samples you are getting from z = prior_dist.rsample((num_samples,)) will be different across the batch. Only the underlying samples from a zero mean normal [1] will be the same but will be transformed based on the batch input [2].

This is not to say you should not file a bug with pytorch developers. I am just interested in why you think this is a problem.

Best Tim

[1] https://github.com/pytorch/pytorch/blob/f4d7cdc5e63c786b1f6588eafa53bbc6d33c3826/torch/distributions/normal.py#L74 [2] https://github.com/pytorch/pytorch/blob/f4d7cdc5e63c786b1f6588eafa53bbc6d33c3826/torch/distributions/normal.py#L75