Open berkut0 opened 6 months ago
If I change --sampling_rate
on the preprocessing stage to 48000 then the error changes too:
The size of tensor a (236) must match the size of tensor b (237) at non-singleton dimension 2
These numbers are not affected by changing the number of channels.
+1, experiencing the same issue whenever I change the sampling_rate and/or num_signal/n_signal parameter. Rave version 2.3.1.
I have the same issue even when preprocessing with the "default" parameters --sampling_rate 48000 --channels 1 --num_signal 131072
The issue appears during validation only (same issue as yours), it seems that output signals are longer than the input.
This propagates to the MSS loss computation, where spectrograms don't have the same number of times frames so the difference can't be computed.
In the RAVE class (rave/model.py), I have updated this function:
def validation_step(self, x, batch_idx):
z = self.encode(x)
if isinstance(self.encoder, blocks.VariationalEncoder):
mean = torch.split(z, z.shape[1] // 2, 1)[0]
else:
mean = None
z = self.encoder.reparametrize(z)[0]
y = self.decode(z)
# - - - quick and dirty attempt to fix this mismatch in the MSS loss inputs' shapes - - -
if x.shape[2] < y.shape[2]: # Crop output
warnings.warn("Cropping output y for MSS loss")
# TODO should crop the beginning instead of the end? Or center the crop?
y = y[:, :, 0:x.shape[2]]
elif x.shape[2] > y.shape[2]:
raise AssertionError("Output is shorter than input")
# - - - end of quick and dirty fix - - -
distance = self.audio_distance(x, y)
full_distance = sum(distance.values())
if self.trainer is not None:
self.log('validation', full_distance)
return torch.cat([x, y], -1), mean
For instance for my dataset, before the crop, x (input) and y (output) had different lengths:
In[2]: x.shape, y.shape
Out[2]: (torch.Size([7, 1, 120423]), torch.Size([7, 1, 120832]))
I have just started using RAVE today so I don't know if this is a proper fix. Worst case, it should influence validation scores only, not the training itself.
Hope this helps!
Same problem here:
RuntimeError: The size of tensor a (26) must match the size of tensor b (29) at non-singleton dimension 2
...and @gwendal-lv fix does not work.
I am working with part of the Audio MNIST dataset (6500 files of 30000). Some files are pretty short so my arguments are:
preprocessing
rave preprocess \
--input_path $input_path \
--output_path $output_path \
--channels 1 \
--sampling_rate 48000 \
--num_signal 14400
resulting in:
channels: 1
lazy: false
n_seconds: 2032.8
sr: 48000
training
rave train \
--config v2_small \
--db_path $db_path \
--name $name \
--val_every 2500 \
--gpu -1 \
--channels 1 \
--n_signal 14400 \
--workers $workers
tried with v1
, v2_small
and v2
. acids-rave==2.3.1
running on an M1pro
btw: there is no --sampling_rate
argument for the training right...?
the problem seems to be related with the sample rate. Changing the sample the --sampling-rate
flag to 44100
works even though the files are all at 48000
When running on a database after preprocessing, the following error occurs:
The size of tensor a (118) must match the size of tensor b (119) at non-singleton dimension 2
Changing the architecture from v2_small to v1 changes the number of b tensors from 119 to 121
To be honest, I'm not familiar with learning networks and can't even guess what this is about. If you have any ideas on how to solve this, any ideas would be greatly appreciated. I'm doing the training on the local machine.
I think the same issue: https://github.com/acids-ircam/RAVE/issues/157
preprocessing
rave preprocess --channels 2 -v 1 --input_path .\ --output_path .\dataset --sampling_rate 96000
training
rave train --config v2_small --db_path .\dataset --out_path .\model --name electron --channels 2