152334H / tortoise-tts-fast

Fast TorToiSe inference (5x or your money back!)
GNU Affero General Public License v3.0
771 stars 179 forks source link

Variables in do_tts.py #19

Open rikabi89 opened 1 year ago

rikabi89 commented 1 year ago

Hi

I was using the below arguments in the normal TTS can you add these to the do_tts? I am getting syntax error when I try do it my self, am a bit of rookie so am not sure why.

parser = argparse.ArgumentParser() parser.add_argument('--text', type=str, help='Text to speak.', default="The expressiveness of autoregressive transformers is literally nuts! I absolutely adore them.") parser.add_argument('--voice', type=str, help='Selects the voice to use for generation. See options in voices/ directory (and add your own!) ' 'Use the & character to join two voices together. Use a comma to perform inference on multiple voices.', default='random') parser.add_argument('--preset', type=str, help='Which voice preset to use.', default='fast') parser.add_argument('--output_path', type=str, help='Where to store outputs.', default='results/') parser.add_argument('--model_dir', type=str, help='Where to find pretrained model checkpoints. Tortoise automatically downloads these to .models, so this' 'should only be specified if you have custom checkpoints.', default=MODELS_DIR) parser.add_argument('--candidates', type=int, help='How many output candidates to produce per-voice.', default=3) parser.add_argument('--seed', type=int, help='Random seed which can be used to reproduce results.', default=None) parser.add_argument('--produce_debug_state', type=bool, help='Whether or not to produce debug_state.pth, which can aid in reproducing problems. Defaults to true.', default=True) parser.add_argument('--cvvp_amount', type=float, help='How much the CVVP model should influence the output.' 'Increasing this can in some cases reduce the likelyhood of multiple speakers. Defaults to 0 (disabled)', default=.0) parser.add_argument('--top-p', type=float, default=None, help='P value used in nucleus sampling. 0 to 1. Lower values mean the decoder produces more "likely" (aka boring) outputs.') parser.add_argument('--temperature', type=float, default=None, help='The softmax temperature of the autoregressive model.') parser.add_argument('--cond-free', type=bool, default=None, help='Whether or not to perform conditioning-free diffusion. Conditioning-free diffusion performs two forward passes for ' 'each diffusion step: one with the outputs of the autoregressive model and one with no conditioning priors. The output ' 'of the two is blended according to the cond_free_k value below. Conditioning-free diffusion is the real deal, and ' 'dramatically improves realism.') parser.add_argument('--diffusion-iterations', type=int, default=None,help='Number of diffusion steps to perform. More steps means the network has more chances to iteratively' 'refine the output, which should theoretically mean a higher quality output. ' 'Generally a value above 250 is not noticeably better, however.') parser.add_argument('--diffusion-temperature', type=float, default=None, help='Controls the variance of the noise fed into the diffusion model. [0,1]. Values at 0 ' 'are the "mean" prediction of the diffusion network and will sound bland and smeared. ') parser.add_argument('--num-autoregressive-samples', type=int, default=None, help='Number of samples taken from the autoregressive model, all of which are filtered using CLVP.' 'As TorToiSe is a probabilistic model, more samples means a higher probability of creating something "great".')

rikabi89 commented 1 year ago

Also amazing job on the speed up

152334H commented 1 year ago

are these from the mrq repo or something

I can add the ones that are missing but 95% of them are there I think

rikabi89 commented 1 year ago

They're in the original repo. All the arguments are found in : scripts\tortpose_tts.py

In the old repo you had to copy the arguments into do_tts.py for most of them to work.

If you can put them into the new gui, that would be amazing. But I must admit after using your fork without the arguments am not even convinced these arguments do much at all.

152334H commented 1 year ago

I understand now. I think it might be preferable to switch to that script, or to work on some pruned version of it, given that it seems to be more configurable.

give me a few days

rikabi89 commented 1 year ago

Another thing I want to add while I have no issues with streamlit run app.py which works fine.

I can't seem to get the command prompts to work at all, which was they way I did it on the original repo.

This is important as I am in the process of finetuning my first model that I would like to test when it's ready.

The issue is when I try to run lets say python tortoise/do_tts.py --preset ultra_fast --text bla bla bla...

I get that it's an un recognized argument.

(tts-fast) H:\tortoise-tts-fast>python tortoise/do_tts.py --preset ultra_fast --text # ...
usage: do_tts.py [-h] [--voice VOICE] [--preset PRESET] [--output_path OUTPUT_PATH] [--model_dir MODEL_DIR] [--seed SEED]
                 [--produce_debug_state PRODUCE_DEBUG_STATE] [--low_vram] [--half] [--kv_cache] [--no_cache]
                 [--sampler {dpm++2m,p,ddim}] [--steps STEPS] [--cond_free COND_FREE] [--cvvp_amount CVVP_AMOUNT]
                 [--autoregressive_samples AUTOREGRESSIVE_SAMPLES] [--original_tortoise] [--ar-checkpoint AR_CHECKPOINT] [--text TEXT]
                 [--candidates CANDIDATES]
do_tts.py: error: unrecognized arguments: ...

Or sometimes this :


(tts-fast) H:\tortoise-tts-fast>python tortoise/do_tts.py --voice emma --seed 42 --text "$TEXT"
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ H:\tortoise-tts-fast\tortoise/do_tts.py:28 in <module>                                           │
│                                                                                                  │
│   25 │   parser.add_argument('--candidates', type=int, help='How many output candidates to pr    │
│   26 │                                                                                           │
│   27 │   args = parser.parse_args()                                                              │
│ ❱ 28 │   kwargs = nullable_kwargs(args)                                                          │
│   29 │   os.makedirs(args.output_path, exist_ok=True)                                            │
│   30 │                                                                                           │
│   31 │   tts = TextToSpeech(models_dir=args.model_dir, high_vram=args.high_vram, kv_cache=arg    │
│                                                                                                  │
│ H:\tortoise-tts-fast\tortoise\base_argparser.py:30 in nullable_kwargs                            │
│                                                                                                  │
│   27 ap.add_argument('--ar-checkpoint', type=str, help='specific autoregressive model checkpo    │
│   28                                                                                             │
│   29 def nullable_kwargs(args, extras={}):                                                       │
│ ❱ 30 │   mappings = {                                                                            │
│   31 │   │   'sampler': 'sampler',                                                               │
│   32 │   │   'steps': 'diffusion_iterations',                                                    │
│   33 │   │   'cond_free': 'cond_free',                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

Do I need to edit these in or should it just run?

152334H commented 1 year ago

--text bla bla bla...

This is a difference between the scripts/tortoise_tts.py script vs the do_tts.py script; the --text argument requires a single quoted argument in do_tts.py, which is of course bad

TypeError: unsupported operand type(s) for |: 'dict' and 'dict'

This is a problem with me using dict unions. I am on python3.9, which allows that, but some users have earlier python versions than that. I will edit the code to be compatible with python3.8.

152334H commented 1 year ago

The dict union bug is addressed https://github.com/152334H/tortoise-tts-fast/commit/8ccdc790f097541fb943ea0a3597dc2b9c0d7497

for the broader cli compatibility i will need to actually work on the full refactor

rikabi89 commented 1 year ago

Thanks for the response.

Are their any plans to add the option to chose a fine tuned .pth in the streamlit? In the meantime I can update to 3.9 if that helps resolve the issue.

152334H commented 1 year ago

I just added the feature lol. It should also be 3.8 compat now.

rikabi89 commented 1 year ago

Can confirm works now. Great work!