gitmylo / audio-webui

A webui for different audio related Neural Networks
MIT License
1.01k stars 94 forks source link

[BUG REPORT] Cannot run bark voice generation with cloning #64

Closed 2haloes closed 1 year ago

2haloes commented 1 year ago

Describe the bug When I click on the generate button after setting up cloning, an exception is reported in the console which stops the generation

To Reproduce Steps to reproduce the behavior:

  1. Setup the bark model
  2. Setup voice cloning
  3. Click generate

Expected behavior I expect a tts generation

Additional context I've copied the stack trace below, I updated and then tried to run this fresh 10 minutes ago, it was working previously but that version was a couple of weeks old


/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/experimental/initialize.py:43: UserWarning: hydra.experimental.initialize() is no longer experimental. Use hydra.initialize()
  deprecation_warning(message=message)
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/experimental/initialize.py:45: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  self.delegate = real_initialize(
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/experimental/compose.py:25: UserWarning: hydra.experimental.compose() is no longer experimental. Use hydra.compose()
  deprecation_warning(message=message)
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/core/default_element.py:124: UserWarning: In 'config': Usage of deprecated keyword in package header '# @package _group_'.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/changes_to_package_header for more information
  deprecation_warning(
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/fairseq/checkpoint_utils.py:425: UserWarning: 
'config' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  state = load_checkpoint_to_cpu(filename, arg_overrides)
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/compose.py:56: UserWarning: 
The strict flag in the compose API is deprecated.
See https://hydra.cc/docs/1.2/upgrades/0.11_to_1.0/strict_mode_flag_deprecated for more info.

  deprecation_warning(
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/experimental/initialize.py:43: UserWarning: hydra.experimental.initialize() is no longer experimental. Use hydra.initialize()
  deprecation_warning(message=message)
/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/hydra/experimental/initialize.py:45: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  self.delegate = real_initialize(
/home/[USER]/dev/audio-webui/webui/modules/implementations/patches/bark_custom_voices.py:35: UserWarning: 
'config' is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
  huberts['hubert'] = CustomHubert(hubert_path)
Loading Custom Tokenizer
Extracting semantics
Tokenizing semantics
Traceback (most recent call last):
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/gradio/routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/gradio/blocks.py", line 1077, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/gradio/helpers.py", line 602, in tracked_fn
    response = fn(*args)
  File "/home/[USER]/dev/audio-webui/webui/ui/tabs/text_to_speech.py", line 81, in _generate
    response, file = loader.get_response(*inputs, progress=progress)
  File "/home/[USER]/dev/audio-webui/webui/modules/implementations/ttsmodels.py", line 210, in get_response
    _speaker = self.create_voice(temp_file.name, clone_model)
  File "/home/[USER]/dev/audio-webui/webui/modules/implementations/ttsmodels.py", line 42, in create_voice
    fine_prompt = generate_fine_from_wav(file)
  File "/home/[USER]/dev/audio-webui/webui/modules/implementations/patches/bark_custom_voices.py", line 88, in generate_fine_from_wav
    encoded_frames = model.encode(wav)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/encodec/model.py", line 144, in encode
    encoded_frames.append(self._encode_frame(frame))
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/encodec/model.py", line 161, in _encode_frame
    emb = self.encoder(x)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/encodec/modules/seanet.py", line 144, in forward
    return self.model(x)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/encodec/modules/conv.py", line 210, in forward
    return self.conv(x)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/encodec/modules/conv.py", line 120, in forward
    x = self.conv(x)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/[USER]/dev/audio-webui/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
gitmylo commented 1 year ago

oh nvm i see, it's got the audio on cpu, but the encodec model is on gpu.

gitmylo commented 1 year ago

I just made the wrong assumption to assume that encodec wouldn't be on gpu during offloading. But i guess encodec doesn't get offloaded because it's so small?