Closed yarodevuci closed 3 months ago
turnvoice .\1234.mp4 -e coqui Welcome to TurnVoice! input parameters:
checking if url is youtube url... extracting from local file... Files '.\1234.wav' and '.\1234_muted.mp4' already exist, skipping extraction file '1234.wav' already exists, skipping renaming... [0.3s] splitting audio... Checking if vocals and accompaniment files exist: downloads\htdemucs_ft\1234/vocals.wav and downloads\htdemucs_ft\1234/no_vocals.wav Vocals and accompaniment files exist, skipping separation and returning paths: downloads\htdemucs_ft\1234/vocals.wav and downloads\htdemucs_ft\1234/no_vocals.wav [0.3s] splitting finished, vocal path is downloads\htdemucs_ft\1234/vocals.wav... [0.3s] early start synthesis engine (grab vram)... No engine specified for voice 1. Using first/default engine coqui Switching engine to coqui, voice male.wav
Using model: xtts Predicted silence(s) with VAD. cting silences(s) with VAD... Transcribe: 0%| | 0/13.49 [00:00<?, ?sec/s]
no matter what I pick it gets stuck
@KoljaB any ideas? Thanks
Input #0, wav, from '1234.wav': Metadata: encoder : Lavf58.29.100 Duration: 00:00:13.49, bitrate: 1411 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, 2 channels, s16, 1411 kb/s Please use -b:a or -b:v, -b is ambiguous [out#0/mp3 @ 000002b8fd386500] Codec AVOption b (set bitrate (in bits/s)) has not been used for any stream. The most likely reason is either wrong type (e.g. a video option with no video streams) or that it is a private option of some encoder which was not actually used for any stream. Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> mp3 (libmp3lame)) Press [q] to stop, [?] for help Output #0, mp3, to 'downloads\1234.mp3': Metadata: TSSE : Lavf60.4.100 Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16p Metadata: encoder : Lavc60.7.100 libmp3lame size= 212kB time=00:00:13.48 bitrate= 128.6kbits/s speed= 138x video:0kB audio:211kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.116396% Splitting audio downloads\1234.mp3 into accompaniment and vocals (downloads\htdemucs_ft\1234/vocals.wav) using demucs. Selected model is a bag of 4 models. You will see that many progress bars per track. Separated tracks will be stored in G:\Local_API\TelegramBots\InstaAnalytics\downloads\htdemucs_ft Separating track downloads\1234.mp3 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 17.549999999999997/17.549999999999997 [00:04<00:00, 3.95seconds/s] 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 17.549999999999997/17.549999999999997 [00:03<00:00, 4.70seconds/s] 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 17.549999999999997/17.549999999999997 [00:03<00:00, 4.75seconds/s] 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 17.549999999999997/17.549999999999997 [00:03<00:00, 4.68seconds/s] Vocals and accompaniment files exist, returning paths: downloads\htdemucs_ft\1234/vocals.wav and downloads\htdemucs_ft\1234/no_vocals.wav [20.6s] splitting finished, vocal path is downloads\htdemucs_ft\1234/vocals.wav... [20.6s] early start synthesis engine (grab vram)... No engine specified for voice 1. Using first/default engine coqui Switching engine to coqui, voice daisy 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 4.36k/4.36k [00:00<?, ?iB/s] 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 1.86G/1.86G [00:18<00:00, 99.8MiB/s] 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 335k/335k [00:00<00:00, 2.35MiB/s] 100%|āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā| 7.75M/7.75M [00:00<00:00, 22.5MiB/s]