[BUG REPORT] Training a v2 48k model does not work

frcsdes commented 1 year ago

Describe the bug When training a model using version 2 and a 48k sampling rate, pytorch complains about a tensor size mismatch in the training step.

To Reproduce Steps to reproduce the behavior:

Go to the 'Train' tab.
Create a new workspace in 'v2 48k'.
In the 'Settings > data' tab, fill the 'Dataset path', click 'Resample and split dataset', 'Extract pitches', and 'Create index file'.
Go to the 'Settings > train' tab and click 'Train'.
The 'Status' column displays an error status shortly afterwards.

Expected behavior The training begins and the 'Status' column starts to show the loss and current step.

Backtrace

UserWarning: Using a target size (torch.Size([16, 128, 33])) that is different to the input size (torch.Size([16, 128, 28])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  loss_mel = F.l1_loss(y_mel, y_hat_mel) * HParams.c_mel
Traceback (most recent call last):
  File "/app/audio-webui/venv/lib/site-packages/gradio/routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "/app/audio-webui/venv/lib/site-packages/gradio/blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "/app/audio-webui/venv/lib/site-packages/gradio/blocks.py", line 1093, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/app/audio-webui/venv/lib/site-packages/gradio/utils.py", line 341, in async_iteration
    return await iterator.__anext__()
  File "/app/audio-webui/venv/lib/site-packages/gradio/utils.py", line 334, in __anext__
    return await anyio.to_thread.run_sync(
  File "/app/audio-webui/venv/lib/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/app/audio-webui/venv/lib/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/app/audio-webui/venv/lib/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/app/audio-webui/venv/lib/site-packages/gradio/utils.py", line 317, in run_sync_iterator_async
    return next(iterator)
  File "/app/audio-webui/webui/ui/tabs/training/training/rvc_workspace.py", line 736, in train_model
    loss_mel = F.l1_loss(y_mel, y_hat_mel) * HParams.c_mel
  File "/app/audio-webui/venv/lib/site-packages/torch/nn/functional.py", line 3263, in l1_loss
    expanded_input, expanded_target = torch.broadcast_tensors(input, target)
  File "/app/audio-webui/venv/lib/site-packages/torch/functional.py", line 74, in broadcast_tensors
    return _VF.broadcast_tensors(tensors)  # type: ignore[attr-defined]
RuntimeError: The size of tensor a (28) must match the size of tensor b (33) at non-singleton dimension 2

Additional context The bug was reproduced in a Docker container by cloning at commit 3cd1995, installing ffmpeg and running the startup script. Therefore I do not suspect external causes.

gitmylo commented 1 year ago

Issue https://github.com/gitmylo/audio-webui/issues/46 is still open, this is not a bug, training of 48k models (both v1 and v2) is not yet implemented. The mentioned issue will be closed once it is. And I'll be closing this issue.

frcsdes commented 1 year ago

Sorry about that, thank you for the quick answer.

gitmylo / audio-webui

[BUG REPORT] Training a v2 48k model does not work #130