Closed iamzoltan closed 2 years ago
I'm working on a fix for this problem ! As a temporary workaround you can convert your checkpoint using this function:
import torch
import os
def convert_checkpoint(ckpt_path: str):
"""
Remove pad buffers from a checkpoint and save
the new converted checkpoint in the same folder
"""
ckpt = torch.load(ckpt_path)
keys = filter(lambda n: "pad" not in n, ckpt["state_dict"].keys())
ckpt["state_dict"] = {k: ckpt["state_dict"][k] for k in keys}
target = os.path.join(os.path.dirname(ckpt_path), "converted.ckpt")
torch.save(ckpt, target)
# FOR EXAMPLE
# convert_checkpoint("runs/ljspeech/rave/version_0/checkpoints/best.ckpt")
It should work for #53 too. I'll update this issue when the final fix is ready !
thanks, ill give it a shot
that worked for the export, thanks! But now trying to train the prior, I get the following:
File "/home/user/code/RAVE/prior/model.py", line 108, in split_classes x = x.reshape(x.shape[0], x.shape[1], self.data_size, -1) RuntimeError: cannot reshape tensor of 0 elements into shape [8, 0, 128, -1] because the unspecified dimension size -1 can be any value and is ambiguous
not sure how to approach this. I initially got an error complaining about a division, but replaced the //
with torch.div(a, b, rounding_mode="floor")
. not sure if that was right, but that error was there in addition to the above.
NOTE: I tried to sort out the empty tensor being passed to split_classes
from validation_step
and I also tried to change the reshaping in the split_classes
function itself; both lead to this error:
File "/home/user/code/RAVE/prior/model.py", line 164, in validation_epoch_end
y = self.decode(z)
File "/home/user/code/RAVE/env3.9/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/user/code/RAVE/prior/model.py", line 76, in decode
return self.synth.decode(z)
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/__torch__.py", line 67, in decode
latent_pca0 = self.latent_pca
_9 = torch.unsqueeze(torch.numpy_T(latent_pca0), -1)
z6 = torch.conv1d(z4, _9)
~~~~~~~~~~~~ <--- HERE
latent_mean0 = self.latent_mean
z7 = torch.add(z6, torch.unsqueeze(latent_mean0, -1))
Traceback of TorchScript, original code (most recent call last):
File "/home/user/code/RAVE/export_rave.py", line 187, in decode
if not self.trained_cropped: # PERFORM PCA AFTER PADDING
z = nn.functional.conv1d(z, self.latent_pca.T.unsqueeze(-1))
~~~~~~~~~~~~~~~~~~~~ <--- HERE
z = z + self.latent_mean.unsqueeze(-1)
RuntimeError: Calculated padded input size per channel: (0). Kernel size: (1). Kernel size can't be greater than actual input size
any ideas?
Should be fixed in d9f55f59627ea02689578d0ddae05b420be8d4d2 can you check ?
By the way I'm closing since it's a duplicate of #45
sounds good, ill will check shortly once this current model is done.
It seems the smaller model successfully got to stage 2, although it looks as though the loss is increasing, is this normal? and how long should one train in stage 2?
Hey again,
I just finished training and exporting a new model, but I cant seem to get it to train the prior. I am getting the following error when exporting the model:
/home/user/code/RAVE/env3.9/lib/python3.9/site-packages/pytorch_lightning/core/saving.py:217: UserWarning: Found keys that are not in the model state dict but in the checkpoint: ['decoder.net.2.net.0.aligned.paddings.0.pad', 'decoder.net.2.net.0.aligned.paddings.1.pad', 'decoder.net.2.net.1.aligned.paddings.0.pad', 'decoder.net.2.net.1.aligned.paddings.1.pad', 'decoder.net.2.net.2.aligned.paddings.0.pad', 'decoder.net.2.net.2.aligned.paddings.1.pad', 'decoder.net.4.net.0.aligned.paddings.0.pad', 'decoder.net.4.net.0.aligned.paddings.1.pad', 'decoder.net.4.net.1.aligned.paddings.0.pad', 'decoder.net.4.net.1.aligned.paddings.1.pad', 'decoder.net.4.net.2.aligned.paddings.0.pad', 'decoder.net.4.net.2.aligned.paddings.1.pad', 'decoder.net.6.net.0.aligned.paddings.0.pad', 'decoder.net.6.net.0.aligned.paddings.1.pad', 'decoder.net.6.net.1.aligned.paddings.0.pad', 'decoder.net.6.net.1.aligned.paddings.1.pad', 'decoder.net.6.net.2.aligned.paddings.0.pad', 'decoder.net.6.net.2.aligned.paddings.1.pad', 'decoder.net.8.net.0.aligned.paddings.0.pad', 'decoder.net.8.net.0.aligned.paddings.1.pad', 'decoder.net.8.net.1.aligned.paddings.0.pad', 'decoder.net.8.net.1.aligned.paddings.1.pad', 'decoder.net.8.net.2.aligned.paddings.0.pad', 'decoder.net.8.net.2.aligned.paddings.1.pad', 'decoder.synth.paddings.0.pad', 'decoder.synth.paddings.1.pad', 'decoder.synth.paddings.2.pad'] rank_zero_warn(
any ides?