acids-ircam / RAVE

Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Other
1.35k stars 184 forks source link

Tensor size mismatch issue #309

Open berkut0 opened 6 months ago

berkut0 commented 6 months ago

When running on a database after preprocessing, the following error occurs: The size of tensor a (118) must match the size of tensor b (119) at non-singleton dimension 2

Changing the architecture from v2_small to v1 changes the number of b tensors from 119 to 121

To be honest, I'm not familiar with learning networks and can't even guess what this is about. If you have any ideas on how to solve this, any ideas would be greatly appreciated. I'm doing the training on the local machine.

I think the same issue: https://github.com/acids-ircam/RAVE/issues/157

preprocessing rave preprocess --channels 2 -v 1 --input_path .\ --output_path .\dataset --sampling_rate 96000

training rave train --config v2_small --db_path .\dataset --out_path .\model --name electron --channels 2

berkut0 commented 6 months ago

If I change --sampling_rate on the preprocessing stage to 48000 then the error changes too: The size of tensor a (236) must match the size of tensor b (237) at non-singleton dimension 2

These numbers are not affected by changing the number of channels.

detailed output (base) PS F:\_ircam\electromagnetic recs> rave train --config v2_small --db_path .\dataset --out_path .\model --name electron --channels 1 I0503 16:29:45.640041 12732 resource_reader.py:50] system_path_file_exists:v2_small.gin E0503 16:29:45.640041 12732 resource_reader.py:55] Path not found: v2_small.gin I0503 16:29:45.640041 12732 resource_reader.py:50] system_path_file_exists:C:\Program Files\Python311\Lib\site-packages\rave\v2_small.gin E0503 16:29:45.649347 12732 resource_reader.py:55] Path not found: C:\Program Files\Python311\Lib\site-packages\rave\v2_small.gin I0503 16:29:45.649347 12732 resource_reader.py:50] system_path_file_exists:configs/v1.gin E0503 16:29:45.649347 12732 resource_reader.py:55] Path not found: configs/v1.gin C:\Program Files\Python311\Lib\site-packages\torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") train set: 2348 examples val set: 48 examples selected gpu: [] GPU available: False, used: False TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs | Name | Type | Params ------------------------------------------------------------------- 0 | pqmf | CachedPQMF | 16.7 K 1 | encoder | VariationalEncoder | 3.9 M 2 | decoder | GeneratorV2 | 3.8 M 3 | discriminator | CombineDiscriminators | 6.8 M 4 | audio_distance | AudioDistanceV1 | 0 5 | multiband_audio_distance | AudioDistanceV1 | 0 ------------------------------------------------------------------- 14.6 M Trainable params 0 Non-trainable params 14.6 M Total params 58.284 Total estimated model params size (MB) Sanity Checking: 0it [00:00, ?it/s]C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 8 which is the number of cpus on this machine) in the `DataLoader` init to improve performance. rank_zero_warn( Sanity Checking DataLoader 0: 0%| | 0/2 [00:00", line 198, in _run_module_as_main File "", line 88, in _run_code File "C:\Program Files\Python311\Scripts\rave.exe\__main__.py", line 7, in File "C:\Program Files\Python311\Lib\site-packages\scripts\main_cli.py", line 30, in main app.run(train.main) File "C:\Program Files\Python311\Lib\site-packages\absl\app.py", line 308, in run _run_main(main, args) File "C:\Program Files\Python311\Lib\site-packages\absl\app.py", line 254, in _run_main sys.exit(main(argv)) ^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\scripts\train.py", line 268, in main trainer.fit(model, train, val, ckpt_path=run) File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit call._call_and_handle_interrupt( File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt return trainer_fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl self._run(model, ckpt_path=self.ckpt_path) File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1103, in _run results = self._run_stage() ^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1182, in _run_stage self._run_train() File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1195, in _run_train self._run_sanity_check() File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1267, in _run_sanity_check val_loop.run() File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(*args, **kwargs) File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\dataloader\evaluation_loop.py", line 152, in advance dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\loop.py", line 199, in run self.advance(*args, **kwargs) File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 137, in advance output = self._evaluation_step(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\loops\epoch\evaluation_epoch_loop.py", line 234, in _evaluation_step output = self.trainer._call_strategy_hook(hook_name, *kwargs.values()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1485, in _call_strategy_hook output = fn(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\pytorch_lightning\strategies\strategy.py", line 390, in validation_step return self.model.validation_step(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\rave\model.py", line 437, in validation_step distance = self.audio_distance(x, y) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\rave\core.py", line 339, in forward lin_distance = mean_difference(x, y, norm='L2', relative=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Program Files\Python311\Lib\site-packages\rave\core.py", line 240, in mean_difference diff = target - value ~~~~~~~^~~~~~~ RuntimeError: The size of tensor a (118) must match the size of tensor b (119) at non-singleton dimension 2
patrickgates commented 6 months ago

+1, experiencing the same issue whenever I change the sampling_rate and/or num_signal/n_signal parameter. Rave version 2.3.1.

gwendal-lv commented 5 months ago

I have the same issue even when preprocessing with the "default" parameters --sampling_rate 48000 --channels 1 --num_signal 131072 The issue appears during validation only (same issue as yours), it seems that output signals are longer than the input. This propagates to the MSS loss computation, where spectrograms don't have the same number of times frames so the difference can't be computed. In the RAVE class (rave/model.py), I have updated this function:

    def validation_step(self, x, batch_idx):

        z = self.encode(x)
        if isinstance(self.encoder, blocks.VariationalEncoder):
            mean = torch.split(z, z.shape[1] // 2, 1)[0]
        else:
            mean = None

        z = self.encoder.reparametrize(z)[0]
        y = self.decode(z)

        # - - - quick and dirty attempt to fix this mismatch in the MSS loss inputs' shapes - - -
        if x.shape[2] < y.shape[2]:  # Crop output
            warnings.warn("Cropping output y for MSS loss")
            # TODO should crop the beginning instead of the end? Or center the crop?
            y = y[:, :, 0:x.shape[2]]
        elif x.shape[2] > y.shape[2]:
            raise AssertionError("Output is shorter than input")
        # - - - end of quick and dirty fix - - -

        distance = self.audio_distance(x, y)
        full_distance = sum(distance.values())

        if self.trainer is not None:
            self.log('validation', full_distance)

        return torch.cat([x, y], -1), mean

For instance for my dataset, before the crop, x (input) and y (output) had different lengths:

In[2]: x.shape, y.shape
Out[2]: (torch.Size([7, 1, 120423]), torch.Size([7, 1, 120832]))

I have just started using RAVE today so I don't know if this is a proper fix. Worst case, it should influence validation scores only, not the training itself.

Hope this helps!

ddgg-el commented 2 months ago

Same problem here:

RuntimeError: The size of tensor a (26) must match the size of tensor b (29) at non-singleton dimension 2

...and @gwendal-lv fix does not work.

I am working with part of the Audio MNIST dataset (6500 files of 30000). Some files are pretty short so my arguments are:

preprocessing

rave preprocess \
    --input_path $input_path \
    --output_path $output_path \
    --channels 1 \
    --sampling_rate 48000 \
    --num_signal 14400

resulting in:

channels: 1
lazy: false
n_seconds: 2032.8
sr: 48000

training

rave train \
    --config v2_small \
    --db_path $db_path \
    --name $name \
    --val_every 2500 \
    --gpu -1 \
    --channels 1 \
    --n_signal 14400 \ 
    --workers $workers

tried with v1, v2_small and v2. acids-rave==2.3.1 running on an M1pro

btw: there is no --sampling_rate argument for the training right...?

ddgg-el commented 2 months ago

the problem seems to be related with the sample rate. Changing the sample the --sampling-rate flag to 44100 works even though the files are all at 48000