Open hermify opened 12 months ago
Detecting silence is not the correct way, we'd need to use VAD for that at least. Simple workaround: extract a 30sec portion from somewhere else in the file and use that for lang detection, that step does not cost any time really. E.g. using ffmpeg (start at 60 seconds): ffmpeg -ss 60 -i YOURFILE -t 30 DETECTIONFILE.wav
Just simple and Interesting workaround! So i will run the model with the detection file and language = auto and grep the information from the output "language: it (p = 0.9)".
I am wondering about: Maybe using a smaller model will be enough and makes the process faster? While using "large" takes about 1x time. The medium or small model will maybe enough for language detection?
The medium or small model will maybe enough for language detection?
Thats a fantastic idea! i assume even the tiny model would serve this purpose very well.
I would add a highpass and lowpass and trim the silence.
Something like this: ffmpeg -i file.wav -af "silencedetect=n=-50dB:d=2, highpass=f=200, lowpass=f=4500, anullsrc=channel_layout=stereo:sample_rate=44100"
Hi there,
while setting language to "auto" and having a file with the first 40 seconds have silence, it detects language "ca" (Chinese). It would be great, that the language detector would trim silence from the audio, before it does language detection.
Because i will have to create subtitle vtt's, I can't remove silence for my own because of the timings.