acids-ircam / rave_vst

Other
184 stars 28 forks source link

VST / Standalone can't load models that have the latent-size cropped #12

Open discordance opened 2 years ago

discordance commented 2 years ago

Hi ! first, thx for sharing all this work ! it's so fun to play with those models.

I have an interesting one:

When training a model using --cropped-latent-size 8, training / exporting and combining works just fine. But the VST/Standalone will fail to load this model, as the latent size is still expected to be 128:

[-] Network - No API response
[ ] RAVE - Encode parameters     1
    1
    8
 2048
[ CPULongType{4} ]
[ ] RAVE - Decode parameters     8
 2048
    2
    1
[ CPULongType{4} ]
[ ] RAVE - Prior parameters     1
 2048
    8
 2048
[ CPULongType{4} ]
[ ] RAVE - Latent size 128
[ ] RAVE - Sampling rate: 48000
[+] RAVE - Model successfully loaded: /Users/nunja2/Library/ACIDS/RAVE/secondy.ts.ts
 - sr : 48000
 - latent size : 128
 - full latent size : 128
 - ratio2048
- prior parameters    1
 2048
    8
 2048
[ CPULongType{4} ]
to low; setting rate to : 11
libc++abi: terminating with uncaught exception of type std::runtime_error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__.py", line 19, in decode
    x: Tensor) -> Tensor:
    _rave = self._rave
    return (_rave).decode(x, )
            ~~~~~~~~~~~~~ <--- HERE
  def encode(self: __torch__.Combined,
    x: Tensor) -> Tensor:
  File "code/__torch__/___torch_mangle_0.py", line 31, in decode
      latent_pca = self.latent_pca
      _0 = torch.unsqueeze(torch.numpy_T(latent_pca), -1)
      z1 = torch.conv1d(z, _0)
           ~~~~~~~~~~~~ <--- HERE
      latent_mean = self.latent_mean
      z2 = torch.add(z1, torch.unsqueeze(latent_mean, -1))

Traceback of TorchScript, original code (most recent call last):
  File "combine_models.py", line 36, in decode
    @torch.jit.export
    def decode(self, x):
        return self._rave.decode(x)
               ~~~~~~~~~~~~~~~~~ <--- HERE
  File "export_rave.py", line 159, in decode
    def decode(self, z):
        if self.trained_cropped:  # PERFORM PCA BEFORE PADDING
            z = nn.functional.conv1d(z, self.latent_pca.T.unsqueeze(-1))
                ~~~~~~~~~~~~~~~~~~~~ <--- HERE
            z = z + self.latent_mean.unsqueeze(-1)

RuntimeError: Given groups=1, weight of size [8, 8, 1], expected input[2, 128, 1] to have 8 channels, but got 128 channels instead