facebookresearch / encodec

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
MIT License
3.51k stars 304 forks source link

Low quality of the reconstruct audio? #66

Closed wotulong closed 1 year ago

wotulong commented 1 year ago

❓ Questions

I implement the code below, the quality of reconstruct audio is much lower than ground truth.

librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

model = EncodecModel.from_pretrained("facebook/encodec_24khz")
processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")

librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate))

audio_sample = librispeech_dummy[0]["audio"]["array"]

def save_audio(audio, save_path, sr=24000):
    audio *= 32767 / max(0.01, np.max(np.abs(audio)))
    scipy.io.wavfile.write(save_path, sr, audio.astype(np.int16))

inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt")

encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"])

audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0]

save_audio(audio_sample, "./audio_gt.wav")
save_audio(torch.squeeze(audio_values, dim=1)[0].data.cpu().float().numpy(), "./audio_reconstruct.wav")

and the spec of audio image

Is there some error in my code? anyone konw why? thks

wotulong commented 1 year ago

It has been solved by myself. We need set bandwith params and use torchaudio to save the audio.

fakerybakery commented 11 months ago

Hi @wotulong, would you mind providing an example of how you solved your issue? Thank you!