Closed FeatureSpitter closed 1 year ago
Do you have write access to the folder? Seems like you don't.
Do you have write access to the folder? Seems like you don't.
should I even have?
anyway even with chmod 777 it still fails.
This is your error PermissionError: [Errno 13] Permission denied: '/root/.local'
I don't have a different explanation than the one above. Sorry.
This is your error
PermissionError: [Errno 13] Permission denied: '/root/.local'
I don't have a different explanation than the one above. Sorry.
I used the jfk.zip example from this other post: https://github.com/coqui-ai/TTS/issues/2745
And it worked fine. I think it has to do with the folder structure, or the file types. I've tried to make them equal but still get that .local permission error with my wav files.
@FeatureSpitter
Hi, I also encountered the same issue yesterday. I could run bark generation without voice clone out of the box, but I faced the same issue when I generated with voice clone.
I found out HuBERT custom tokenizer download path is not set in the current implementation.
This is the model.config.LOCAL_MODEL_PATHS at https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/layers/bark/inference_funcs.py#L134
{'text': '/Users/<myname>/Library/Application Support/tts/tts_models--multilingual--multi-dataset--bark/text_2.pt', 'coarse': '/Users/<myname>/Library/Application Support/tts/tts_models--multilingual--multi-dataset--bark/coarse_2.pt', 'fine': '/Users/<myname>/Library/Application Support/tts/tts_models--multilingual--multi-dataset--bark/fine_2.pt', 'hubert_tokenizer': '/root/.local/share/tts/suno/bark_v0/tokenizer.pth', 'hubert': '/root/.local/share/tts/suno/bark_v0/hubert.pt'}
I think other model paths are set at https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/models/bark.py#L270, but hubert and tokenieer path is not set, so it directing ./root, which is read-only.
I think you can fix it by modifying the hubert_tokenizer model path from ./root to others by hard-code or downloading the hubert_tokenizer manually to the /root/.local/share/tts/suno/bark_v0/. (this path may be different in your setting).
I fixed this issue by adding the following line at https://github.com/coqui-ai/TTS/blob/dev/TTS/tts/models/bark.py#L270 like this.
self.config.LOCAL_MODEL_PATHS["text"] = text_model_path
self.config.LOCAL_MODEL_PATHS["coarse"] = coarse_model_path
self.config.LOCAL_MODEL_PATHS["fine"] = fine_model_path
# This is workaround I found. I know this is not good solution, but it works for now
self.config.LOCAL_MODEL_PATHS["hubert_tokenizer"] = os.path.join(checkpoint_dir, "hubert_tokenizer.pth")
self.config.LOCAL_MODEL_PATHS["hubert"] = os.path.join(checkpoint_dir, "hubert.pt")
I'm unsure if it helps your situation, but I just share my way.
Any update on this? Just ran into this issue out-of-the-box myself. It seems that it's trying to download something to /root
which doesn't work given that /root
is only writable by root
, not a non-superuser/non-sudo.
I encountered this problem too. After resolving the code, I found the problem arises from the bark config file config.json
. In my case the config file config.json
is located at
~/.local/share/tts/tts_models--multilingual--multi-dataset--bark/config.json
:
{
"model": "bark",
"output_path": "output",
"logger_uri": null,
"run_name": "run",
"project_name": null,
"run_description": "\ud83d\udc38Coqui trainer run.",
"print_step": 25,
"plot_step": 100,
"model_param_stats": false,
"wandb_entity": null,
"dashboard_logger": "tensorboard",
"log_model_step": null,
"save_step": 10000,
"save_n_checkpoints": 5,
"save_checkpoints": true,
"save_all_best": false,
"save_best_after": 10000,
"target_loss": null,
"print_eval": false,
"test_delay_epochs": 0,
"run_eval": true,
"run_eval_steps": null,
"distributed_backend": "nccl",
"distributed_url": "tcp://localhost:54321",
"mixed_precision": false,
"epochs": 1000,
"batch_size": 32,
"eval_batch_size": 16,
"grad_clip": 0.0,
"scheduler_after_epoch": true,
"lr": 0.001,
"optimizer": "radam",
"optimizer_params": null,
"lr_scheduler": null,
"lr_scheduler_params": {},
"use_grad_scaler": false,
"cudnn_enable": true,
"cudnn_deterministic": false,
"cudnn_benchmark": false,
"training_seed": 54321,
"num_loader_workers": 0,
"num_eval_loader_workers": 0,
"use_noise_augment": false,
"audio": {
"fft_size": 1024,
"win_length": 1024,
"hop_length": 256,
"frame_shift_ms": null,
"frame_length_ms": null,
"stft_pad_mode": "reflect",
"sample_rate": 22050,
"resample": false,
"preemphasis": 0.0,
"ref_level_db": 20,
"do_sound_norm": false,
"log_func": "np.log10",
"do_trim_silence": true,
"trim_db": 45,
"do_rms_norm": false,
"db_level": null,
"power": 1.5,
"griffin_lim_iters": 60,
"num_mels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"spec_gain": 20,
"do_amp_to_db_linear": true,
"do_amp_to_db_mel": true,
"pitch_fmax": 640.0,
"pitch_fmin": 1.0,
"signal_norm": true,
"min_level_db": -100,
"symmetric_norm": true,
"max_norm": 4.0,
"clip_norm": true,
"stats_path": null
},
"use_phonemes": false,
"phonemizer": null,
"phoneme_language": null,
"compute_input_seq_cache": false,
"text_cleaner": null,
"enable_eos_bos_chars": false,
"test_sentences_file": "",
"phoneme_cache_path": null,
"characters": null,
"add_blank": false,
"batch_group_size": 0,
"loss_masking": null,
"min_audio_len": 1,
"max_audio_len": Infinity,
"min_text_len": 1,
"max_text_len": Infinity,
"compute_f0": false,
"compute_energy": false,
"compute_linear_spec": false,
"precompute_num_workers": 0,
"start_by_longest": false,
"shuffle": false,
"drop_last": false,
"datasets": [
{
"formatter": "",
"dataset_name": "",
"path": "",
"meta_file_train": "",
"ignored_speakers": null,
"language": "",
"phonemizer": "",
"meta_file_val": "",
"meta_file_attn_mask": ""
}
],
"test_sentences": [],
"eval_split_max_size": null,
"eval_split_size": 0.01,
"use_speaker_weighted_sampler": false,
"speaker_weighted_sampler_alpha": 1.0,
"use_language_weighted_sampler": false,
"language_weighted_sampler_alpha": 1.0,
"use_length_weighted_sampler": false,
"length_weighted_sampler_alpha": 1.0,
"num_chars": 0,
"semantic_config": {
"block_size": 1024,
"input_vocab_size": 10048,
"output_vocab_size": 10048,
"n_layer": 12,
"n_head": 12,
"n_embd": 768,
"dropout": 0.0,
"bias": true
},
"fine_config": {
"block_size": 1024,
"input_vocab_size": 10048,
"output_vocab_size": 10048,
"n_layer": 12,
"n_head": 12,
"n_embd": 768,
"dropout": 0.0,
"bias": true,
"n_codes_total": 8,
"n_codes_given": 1
},
"coarse_config": {
"block_size": 1024,
"input_vocab_size": 10048,
"output_vocab_size": 10048,
"n_layer": 12,
"n_head": 12,
"n_embd": 768,
"dropout": 0.0,
"bias": true
},
"CONTEXT_WINDOW_SIZE": 1024,
"SEMANTIC_RATE_HZ": 49.9,
"SEMANTIC_VOCAB_SIZE": 10000,
"CODEBOOK_SIZE": 1024,
"N_COARSE_CODEBOOKS": 2,
"N_FINE_CODEBOOKS": 8,
"COARSE_RATE_HZ": 75,
"SAMPLE_RATE": 24000,
"USE_SMALLER_MODELS": false,
"TEXT_ENCODING_OFFSET": 10048,
"SEMANTIC_PAD_TOKEN": 10000,
"TEXT_PAD_TOKEN": 129595,
"SEMANTIC_INFER_TOKEN": 129599,
"COARSE_SEMANTIC_PAD_TOKEN": 12048,
"COARSE_INFER_TOKEN": 12050,
"REMOTE_MODEL_PATHS": {
"text": {
"path": "https://huggingface.co/erogol/bark/tree/main/text_2.pt",
"checksum": "54afa89d65e318d4f5f80e8e8799026a"
},
"coarse": {
"path": "https://huggingface.co/erogol/bark/tree/main/coarse_2.pt",
"checksum": "8a98094e5e3a255a5c9c0ab7efe8fd28"
},
"fine": {
"path": "https://huggingface.co/erogol/bark/tree/main/fine_2.pt",
"checksum": "59d184ed44e3650774a2f0503a48a97b"
}
},
"LOCAL_MODEL_PATHS": {
"text": "/root/.local/share/tts/suno/bark_v0/text_2.pt",
"coarse": "/root/.local/share/tts/suno/bark_v0/coarse_2.pt",
"fine": "/root/.local/share/tts/suno/bark_v0/fine_2.pt",
"hubert_tokenizer": "/root/.local/share/tts/suno/bark_v0/tokenizer.pth",
"hubert": "/root/.local/share/tts/suno/bark_v0/hubert.pt"
},
"SMALL_REMOTE_MODEL_PATHS": {
"text": {
"path": "https://huggingface.co/erogol/bark/tree/main/text.pt"
},
"coarse": {
"path": "https://huggingface.co/erogol/bark/tree/main/coarse.pt"
},
"fine": {
"path": "https://huggingface.co/erogol/bark/tree/main/fine.pt"
}
},
"CACHE_DIR": "/root/.local/share/tts/suno/bark_v0"
}
You can modify this config file to this to resolve this problem:
{
"model": "bark",
"output_path": "output",
"logger_uri": null,
"run_name": "run",
"project_name": null,
"run_description": "\ud83d\udc38Coqui trainer run.",
"print_step": 25,
"plot_step": 100,
"model_param_stats": false,
"wandb_entity": null,
"dashboard_logger": "tensorboard",
"log_model_step": null,
"save_step": 10000,
"save_n_checkpoints": 5,
"save_checkpoints": true,
"save_all_best": false,
"save_best_after": 10000,
"target_loss": null,
"print_eval": false,
"test_delay_epochs": 0,
"run_eval": true,
"run_eval_steps": null,
"distributed_backend": "nccl",
"distributed_url": "tcp://localhost:54321",
"mixed_precision": false,
"epochs": 1000,
"batch_size": 32,
"eval_batch_size": 16,
"grad_clip": 0.0,
"scheduler_after_epoch": true,
"lr": 0.001,
"optimizer": "radam",
"optimizer_params": null,
"lr_scheduler": null,
"lr_scheduler_params": {},
"use_grad_scaler": false,
"cudnn_enable": true,
"cudnn_deterministic": false,
"cudnn_benchmark": false,
"training_seed": 54321,
"num_loader_workers": 0,
"num_eval_loader_workers": 0,
"use_noise_augment": false,
"audio": {
"fft_size": 1024,
"win_length": 1024,
"hop_length": 256,
"frame_shift_ms": null,
"frame_length_ms": null,
"stft_pad_mode": "reflect",
"sample_rate": 22050,
"resample": false,
"preemphasis": 0.0,
"ref_level_db": 20,
"do_sound_norm": false,
"log_func": "np.log10",
"do_trim_silence": true,
"trim_db": 45,
"do_rms_norm": false,
"db_level": null,
"power": 1.5,
"griffin_lim_iters": 60,
"num_mels": 80,
"mel_fmin": 0.0,
"mel_fmax": null,
"spec_gain": 20,
"do_amp_to_db_linear": true,
"do_amp_to_db_mel": true,
"pitch_fmax": 640.0,
"pitch_fmin": 1.0,
"signal_norm": true,
"min_level_db": -100,
"symmetric_norm": true,
"max_norm": 4.0,
"clip_norm": true,
"stats_path": null
},
"use_phonemes": false,
"phonemizer": null,
"phoneme_language": null,
"compute_input_seq_cache": false,
"text_cleaner": null,
"enable_eos_bos_chars": false,
"test_sentences_file": "",
"phoneme_cache_path": null,
"characters": null,
"add_blank": false,
"batch_group_size": 0,
"loss_masking": null,
"min_audio_len": 1,
"max_audio_len": Infinity,
"min_text_len": 1,
"max_text_len": Infinity,
"compute_f0": false,
"compute_energy": false,
"compute_linear_spec": false,
"precompute_num_workers": 0,
"start_by_longest": false,
"shuffle": false,
"drop_last": false,
"datasets": [
{
"formatter": "",
"dataset_name": "",
"path": "",
"meta_file_train": "",
"ignored_speakers": null,
"language": "",
"phonemizer": "",
"meta_file_val": "",
"meta_file_attn_mask": ""
}
],
"test_sentences": [],
"eval_split_max_size": null,
"eval_split_size": 0.01,
"use_speaker_weighted_sampler": false,
"speaker_weighted_sampler_alpha": 1.0,
"use_language_weighted_sampler": false,
"language_weighted_sampler_alpha": 1.0,
"use_length_weighted_sampler": false,
"length_weighted_sampler_alpha": 1.0,
"num_chars": 0,
"semantic_config": {
"block_size": 1024,
"input_vocab_size": 10048,
"output_vocab_size": 10048,
"n_layer": 12,
"n_head": 12,
"n_embd": 768,
"dropout": 0.0,
"bias": true
},
"fine_config": {
"block_size": 1024,
"input_vocab_size": 10048,
"output_vocab_size": 10048,
"n_layer": 12,
"n_head": 12,
"n_embd": 768,
"dropout": 0.0,
"bias": true,
"n_codes_total": 8,
"n_codes_given": 1
},
"coarse_config": {
"block_size": 1024,
"input_vocab_size": 10048,
"output_vocab_size": 10048,
"n_layer": 12,
"n_head": 12,
"n_embd": 768,
"dropout": 0.0,
"bias": true
},
"CONTEXT_WINDOW_SIZE": 1024,
"SEMANTIC_RATE_HZ": 49.9,
"SEMANTIC_VOCAB_SIZE": 10000,
"CODEBOOK_SIZE": 1024,
"N_COARSE_CODEBOOKS": 2,
"N_FINE_CODEBOOKS": 8,
"COARSE_RATE_HZ": 75,
"SAMPLE_RATE": 24000,
"USE_SMALLER_MODELS": false,
"TEXT_ENCODING_OFFSET": 10048,
"SEMANTIC_PAD_TOKEN": 10000,
"TEXT_PAD_TOKEN": 129595,
"SEMANTIC_INFER_TOKEN": 129599,
"COARSE_SEMANTIC_PAD_TOKEN": 12048,
"COARSE_INFER_TOKEN": 12050,
"REMOTE_MODEL_PATHS": {
"text": {
"path": "https://huggingface.co/erogol/bark/tree/main/text_2.pt",
"checksum": "54afa89d65e318d4f5f80e8e8799026a"
},
"coarse": {
"path": "https://huggingface.co/erogol/bark/tree/main/coarse_2.pt",
"checksum": "8a98094e5e3a255a5c9c0ab7efe8fd28"
},
"fine": {
"path": "https://huggingface.co/erogol/bark/tree/main/fine_2.pt",
"checksum": "59d184ed44e3650774a2f0503a48a97b"
}
},
"LOCAL_MODEL_PATHS": {
"text": "~/.local/share/tts/suno/bark_v0/text_2.pt",
"coarse": "~/.local/share/tts/suno/bark_v0/coarse_2.pt",
"fine": "~/.local/share/tts/suno/bark_v0/fine_2.pt",
"hubert_tokenizer": "~/.local/share/tts/suno/bark_v0/tokenizer.pth",
"hubert": "~/.local/share/tts/suno/bark_v0/hubert.pt"
},
"SMALL_REMOTE_MODEL_PATHS": {
"text": {
"path": "https://huggingface.co/erogol/bark/tree/main/text.pt"
},
"coarse": {
"path": "https://huggingface.co/erogol/bark/tree/main/coarse.pt"
},
"fine": {
"path": "https://huggingface.co/erogol/bark/tree/main/fine.pt"
}
},
"CACHE_DIR": "~/.local/share/tts/suno/bark_v0"
}
I have pulled a request to huggingface model card erogol/bark
to resolve this.
Should be fixed by #2894
Same bug encountered as of v0.22.0 for the github version of TTS.
Same bug with 0.22.0
@erogol not fixed, you should reopen this.
@reopio you have little big error in the code you fixed!
so basically what you did was changing this
"LOCAL_MODEL_PATHS": {
"text": "/root/.local/share/tts/suno/bark_v0/text_2.pt",
"coarse": "/root/.local/share/tts/suno/bark_v0/coarse_2.pt",
"fine": "/root/.local/share/tts/suno/bark_v0/fine_2.pt",
"hubert_tokenizer": "/root/.local/share/tts/suno/bark_v0/tokenizer.pth",
"hubert": "/root/.local/share/tts/suno/bark_v0/hubert.pt"
},
"SMALL_REMOTE_MODEL_PATHS": {
"text": {
"path": "https://huggingface.co/erogol/bark/tree/main/text.pt"
},
"coarse": {
"path": "https://huggingface.co/erogol/bark/tree/main/coarse.pt"
},
"fine": {
"path": "https://huggingface.co/erogol/bark/tree/main/fine.pt"
}
},
"CACHE_DIR": "/root/.local/share/tts/suno/bark_v0"
}
into this
"LOCAL_MODEL_PATHS": {
"text": "~/.local/share/tts/suno/bark_v0/text_2.pt",
"coarse": "~/.local/share/tts/suno/bark_v0/coarse_2.pt",
"fine": "~/.local/share/tts/suno/bark_v0/fine_2.pt",
"hubert_tokenizer": "~/.local/share/tts/suno/bark_v0/tokenizer.pth",
"hubert": "~/.local/share/tts/suno/bark_v0/hubert.pt"
},
"SMALL_REMOTE_MODEL_PATHS": {
"text": {
"path": "https://huggingface.co/erogol/bark/tree/main/text.pt"
},
"coarse": {
"path": "https://huggingface.co/erogol/bark/tree/main/coarse.pt"
},
"fine": {
"path": "https://huggingface.co/erogol/bark/tree/main/fine.pt"
}
},
"CACHE_DIR": "~/.local/share/tts/suno/bark_v0"
}
replacing /root
with ~
.
So you correctly identified the problem, but you didn't consider that python, not being bash, does NOT automatically expand the ~
into /home/username
.
Running "tts" after your change, will cause the code to create a subdirectory named ~
in any dir from which the "tts" command is ran from. which in turn causes #3567 (I stand corrected, that's unrelated, but is still is something that needs to be addressed as has to do with huggingface models not being correctly downloaded).
The correct solution would be using the expanduser
function, like this:
my_dir = os.path.expanduser("~/some_dir")
# my_dir => "/home/username/some_dir"
I hope you can add this correction to your erogol/bark PR :)
ouch, just forgot to remember that was a JSON file ! 🤦♂😅 well, then the change has to be made in the model loading functions in /TTS/utils/synthesizer.py
for example this one: self._load_tts_from_dir(model_dir, use_cuda)
same problem here. what is the recommended fix ?
I did do the changes recommended ( edit ~/.local/share/tts/tts_models--multilingual--multi-dataset--bark/config.json
and change /root/
to /home/myuser/
), and now I get this error:
╰─(base) ⠠⠵ tts --model_name tts_models/multilingual/multi-dataset/bark --text "Hey look, she's awake! I can't believe she's awake, that's crazy." --out_path /tmp/output.wav --progress_bar True --voice_dir /ram/ --speaker_idx "tommy" on dev|✔
> tts_models/multilingual/multi-dataset/bark is already downloaded.
> Using model: bark
/home/arthur/.anaconda3/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 17.7k/17.7k [00:00<00:00, 76.1MiB/s]
Traceback (most recent call last):
File "/home/arthur/.anaconda3/bin/tts", line 8, in <module>
sys.exit(main())
^^^^^^
File "/home/arthur/dev/ai/TTS/TTS/bin/synthesize.py", line 423, in main
synthesizer = Synthesizer(
^^^^^^^^^^^^
File "/home/arthur/dev/ai/TTS/TTS/utils/synthesizer.py", line 109, in __init__
self._load_tts_from_dir(model_dir, use_cuda)
File "/home/arthur/dev/ai/TTS/TTS/utils/synthesizer.py", line 164, in _load_tts_from_dir
self.tts_model.load_checkpoint(config, checkpoint_dir=model_dir, eval=True)
File "/home/arthur/dev/ai/TTS/TTS/tts/models/bark.py", line 281, in load_checkpoint
self.load_bark_models()
File "/home/arthur/dev/ai/TTS/TTS/tts/models/bark.py", line 50, in load_bark_models
self.semantic_model, self.config = load_model(
^^^^^^^^^^^
File "/home/arthur/dev/ai/TTS/TTS/tts/layers/bark/load_model.py", line 121, in load_model
checkpoint = torch.load(ckpt_path, map_location=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/arthur/.anaconda3/lib/python3.11/site-packages/torch/serialization.py", line 1040, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/arthur/.anaconda3/lib/python3.11/site-packages/torch/serialization.py", line 1258, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_pickle.UnpicklingError: invalid load key, '<'.
@arthurwolf: redownloading the models may fix it (as advised in #3567) but keep in mind you are likely to encounter other bugs, as this codebase is no longer officially maintained.
If Bark is what you were after, you can install it from its official repository featuring updated code working out of the box.
On the other hand, if you were looking for XTTS, you could try AllTalk. It's a newer XTTSv2 implementation coming with an API, DeepSpeed support, and other interesting additional features.
as this codebase is no longer officially maintained
Is this something that should be mentioned in the README.md? Until you said this I was not aware of this fact 🙂
@illtellyoulater
If Bark is what you were after,
I've been trying to get bark to work for weeks (generating speech then trying to use other methods to change the voice to match a sample), came to coqui-ai looking for an alternative (as it seemed to be able to do both tts and voice at the same time), and then coqui-ai docs say "hey if you want to do that use our version of bark" ...
You might actually know how to do what I'm looking for.
I need to either do text to speech with a custom voice (from a sample), or even just convert existing speech to have a different voice (from a custom sample).
What would you recommend as the best way to get there, currently ?
I'll look at https://github.com/erew123/alltalk_tts/ thanks a lot for that.
@arthurwolf
I need to either do text to speech with a custom voice (from a sample), or even just convert existing speech to have a different voice (from a custom sample).
Bark it's an amazing open-source TTS model from "Suno" but the version released by Suno however is not incredibly practical as it will only let you generate 16 words per run. Also, I think that recreating voices with it is a bit more convoluted than with XTTS, at least with the original Suno code, and I haven't researched other third-party implementations enough to be able to suggest one.
However, the good news is that what you are trying to do is exactly what Coqui XTTS excels at! In facts, it only needs a 7-10 secs audio file for it to learn to speak approximately with the same voice. Another great feature of XTTS is that independently from original speaker language, the copied voice will be available automagically in ~16 different languages, all sounding good and natural!
There is only a little problem, after Cocqui's shutdown and release of the model as open source, all of the codebase necessary to run it (this repository) got unmaintained, with some parts becoming broken... and this is exactly where third-parties re-implementations like AllTalk come into play essentially providing an updated, refined and enhanced version of it. Btw, if you need an even easier to use alternative than AllTalk, take a look at github.com/daswer123/xtts-webui, as you will be able to run all the steps I described above entirely from the included browser UI it comes with!
That's all! I hope this clarifies all of your doubts and helps you getting on track Keep on going, you're almost there! 🗣🎶
PS: for an added bonus I'll just leave this here: dozens of free voices ready to be downloaded and used in XTTS, enjoy!
@isaac-mcfadyen
Is this something that should be mentioned in the README.md? Until you said this I was not aware of this fact 🙂
Not another word on this please! It brings back... mixed memories ;) (https://github.com/coqui-ai/TTS/issues/3569#issue-2128660232)
@illtellyoulater thank you so much for the help, I was stuck for a long time trying to get projects to work that I now realize were completely outdated/abandonned, I got alltalk running and I went from 20% to 90% of the way to what I want, absolutely amazing. Thank you again.
Do you know if there's any way to get it to generate whispering or shouting? Some kind of keyword or prompting trick? Or some other project that'd be able to do that? I searched a lot and had not much luck. Bark is able to do it a little bit some of the time, but not with a custom voice...
Describe the bug
I have been following this tutorial: https://tts.readthedocs.io/en/dev/models/bark.html#example-use
To Reproduce
But this is the result I got:
Expected behavior
For it to produce the output.wav with the voice in the bark_voices folder
Logs
No response
Environment
Additional context
No response