aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
532 stars 50 forks source link

revert custom .wav file changes #180

Closed rejuce closed 9 months ago

rejuce commented 9 months ago

I got it working with the last version, that I could load the speech files fo refine.

I did not like the result though. now I wanted no continue converting my books with the default models set by --engine xtts --speaker ...

but somehow it still tries to load the latent speaker stuff from before when i supplied the three wave files. i tried deleteing the model completly but no change

how do I get back the original behvaiour I had when calling python3 /home/jk/epub2tts/epub2tts.py ~/epubconvert/....epub --engine xtts --speaker "Badr Odhiambo" before I called it with the .wav files as input?

aedocw commented 9 months ago

I'm not really sue what could be happening here? Between runs there's nothing that caches the previously supplied wave files as far as I am aware.

Can you share output showing where it's loading anything from the previous speaker so we can get a better look at what might be going on?

rejuce commented 9 months ago

the first part stayed the same, it list the chapters an loads the model

_Part: 27 Then, once we proceeded halfway to the demon's camp, the black cube, Mao, sent a number of demons forward too. Is that large man with a doglike face a kobold? There was a woman who looked like a vampire in armor, and a heavily armored lizardman too. B Length: 5599 Part: 28 Extra Story, The Prickly Girl Wants to Be Spoiled Too It was a little after Souma and his group returned from the Demon Lord's Domain to the Kingdom. Around this time, the two countries were sorting out how they were going to announce the complete l Number of chapters to read: 28 Saving to How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-badr-odhiambo.m4b Total characters: 390646 Using GPU VRAM: 4294443008 Loading model: /home/jk/.local/share/tts/tts_models--multilingual--multi-dataset--xtts_v2

tts_models/multilingual/multi-dataset/xttsv2 is already downloaded. Using model: xtts

now before at this point it listed all the previous already converted chapters, shows the next batch of sentences to tts and startrs processing, also processed it, showed the realtimefactor etc...thats alls gone now

but since i used the .wav files ones (even in differnt directory) the already finishes chapters of this book are ignored and the output looks the same as when I called it with the .wav files

[2024-01-08 21:11:47,792] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2024-01-08 21:11:48,254] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown [2024-01-08 21:11:48,255] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter replace_method is deprecated. This parameter is no longer needed, please remove from your call to DeepSpeed-inference [2024-01-08 21:11:48,255] [WARNING] [config_utils.py:69:_process_deprecated_field] Config parameter mp_size is deprecated use tensor_parallel.tp_size instead [2024-01-08 21:11:48,256] [INFO] [logging.py:96:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1 Using /home/jk/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /home/jk/.cache/torch_extensions/py310_cu121/transformer_inference/build.ninja... Building extension module transformer_inference... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) ninja: no work to do. Loading extension module transformer_inference... Time to load transformer_inference op: 0.06924605369567871 seconds [2024-01-08 21:11:49,068] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed-Inference config: {'layer_id': 0, 'hidden_size': 1024, 'intermediate_size': 4096, 'heads': 16, 'num_hidden_layers': -1, 'dtype': torch.float32, 'pre_layer_norm': True, 'norm_type': <NormType.LayerNorm: 1>, 'local_rank': -1, 'stochastic_mode': False, 'epsilon': 1e-05, 'mp_size': 1, 'scale_attention': True, 'triangular_masking': True, 'local_attention': False, 'window_size': 1, 'rotary_dim': -1, 'rotate_half': False, 'rotate_every_two': True, 'return_tuple': True, 'mlp_after_attn': True, 'mlp_act_func_type': <ActivationFuncType.GELU: 1>, 'specialized_mode': False, 'training_mp_size': 1, 'bigscience_bloom': False, 'max_out_tokens': 1024, 'min_out_tokens': 1, 'scale_attn_by_inverse_layer_idx': False, 'enable_qkv_quantization': False, 'use_mup': False, 'return_single_tuple': False, 'set_empty_params': False, 'transposed_mode': False, 'use_triton': False, 'triton_autotune': False, 'num_kv': -1, 'rope_theta': 10000} VRAM: 4294443008 Computing speaker latents... Reading from 1 to 28 How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-1.wav exists, skipping to next chapter 0.20% spoken so far. Elapsed: 0 minutes, ETA: 0 minutes How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-2.wav exists, skipping to next chapter 0.26% spoken so far. Elapsed: 0 minutes, ETA: 1 minutes 0%| | 0/2 [00:00<?, ?it/s]------------------------------------------------------ Free memory : 0.474219 (GigaBytes) Total memory: 3.999512 (GigaBytes) Requested memory: 0.335938 (GigaBytes) Setting maximum total tokens (input + output) to 1024 WorkSpace: 0xbb4a00000

0%| | 0/2 [00:13<?, ?it/s]

since that one call with the .wav files, now the intermediate files it creates are .wav and it looks like it is detecting ones, but somehow it looks now for .wav intermediate files,
but the intermediate ones created before were .flac and also was looking for flac before.

'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-1.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-1.wav' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-10.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-11.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-12.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-13.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-14.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-15.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-16.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-2.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-2.wav' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-3.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-4.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-5.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-6.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-7.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-8.flac' 'How a Realist Hero Rebuilt the Kingdom - LN 17 Premium-9.flac'

It is just surprising that calling once with the input .wav options, changes also the behaviour of calling it without. or did this change due to recent comits?

rejuce commented 9 months ago

ahh just saw the last merge ffmepg, changed the format of the intermediate files

then it still would be nice to have the realtime factor display back as well

aedocw commented 9 months ago

AH! I understand what happened... If epub2tts is interrupted before completing, it will leave the intermediate files it completed behind. When I first started working on this, it would occasionally crash, and sometimes it would be after having already done a lot of the work. For instance each chapter is made up of a bunch of small files that get concatenated into one file per chapter. So if it crashes and you leave those temp files sitting there, epub2tts will pick up where it left off. BUT if you want to start fresh after a crash, you need to delete all the temp-N.wav (or .flac if you were using an earlier version) files. Same with each chapter file, for instance if the epub was "mybook.epub" and you had --start 5, and it crashed while working on the third part, you will have mybook-5.wav and mybook-6.wav sitting in the directory.

SO - if you remove all those temp files and start again, it will start fresh from the beginning. If you leave the temp files, it will try to pick up where it left off.

Hope that explanation is clear enough and helps, please feel free to ask about anything else that comes up!

rejuce commented 8 months ago

Yeah it's a very useful feature. Just started the book, then updated and expected to continue it. It would have worked but the comit changed the intermediate format. All good