Zulko / moviepy

Video editing with Python
https://zulko.github.io/moviepy/
MIT License
12.09k stars 1.51k forks source link

AudioArrayClip processing sample rate wrongly #2086

Open arturstopa opened 6 months ago

arturstopa commented 6 months ago

When passing a wav file represented as numpy.ndarray to moviepy.audio.AudioClip.AudioArrayClip the sound gets distorted and twice as long. When saving the same wav sound to file using scipy and then loading it with moviepy.audio.io.AudioFileClip.AudioFileClip the audio is fine. Playing back the original audio is also fine.

In the following snippet: tts_out.wav and test-audio-indirect-i16.wav are a 5 second sound, without any distortions and artifacts. test-audio-direct-i16.wav is a 10 second, highly distorted audio. When passing fps = 2*TTS_OUTPUT_SAMPLERATE the audio has correct length and while words are recognizeable, audio is still highly distorted.

wav = tts.synthesize(TEST_TEXT).reshape(-1,1)
scipy.io.wavfile.write("tts_out.wav", TTS_OUTPUT_SAMPLERATE, wav)

audio = AudioArrayClip(wav, fps = TTS_OUTPUT_SAMPLERATE)
audio.write_audiofile("test-audio-direct-i16.wav")

audio = AudioFileClip("tts_out.wav", fps = TTS_OUTPUT_SAMPLERATE)
audio.write_audiofile("test-audio-indirect-i16.wav")

Specifications

SohamTilekar commented 6 months ago

It is not the Bug Your Code is Wrong Close the issue Problematic code: - you provide the TTS_OUTPUT_SAMPLERATE while crating AudioArrayClip instead of the _TTS_INPUTSAMPLERATE

audio = AudioArrayClip(wav, fps = TTS_OUTPUT_SAMPLERATE)
audio.write_audiofile("test-audio-direct-i16.wav")
arturstopa commented 4 months ago

Where do I get the TTS_INPUT_SAMPLERATE from? tts.synthesise() generates audio with sample rate equal to TTS_OUTPUT_SAMPLERATE, there are no other sample rates that I'm aware of. Maybe I should've pointed out that TTS stands for Text To Speech. Also note that when loading the same audio clip with the same sample rate with AudioFileClip class it's working correctly.