Low quality of the reconstruct audio？

wotulong commented 1 year ago

❓ Questions

I implement the code below, the quality of reconstruct audio is much lower than ground truth.

librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")

model = EncodecModel.from_pretrained("facebook/encodec_24khz")
processor = AutoProcessor.from_pretrained("facebook/encodec_24khz")

librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate))

audio_sample = librispeech_dummy[0]["audio"]["array"]

def save_audio(audio, save_path, sr=24000):
    audio *= 32767 / max(0.01, np.max(np.abs(audio)))
    scipy.io.wavfile.write(save_path, sr, audio.astype(np.int16))

inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt")

encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"])

audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0]

save_audio(audio_sample, "./audio_gt.wav")
save_audio(torch.squeeze(audio_values, dim=1)[0].data.cpu().float().numpy(), "./audio_reconstruct.wav")

and the spec of audio

Is there some error in my code? anyone konw why? thks

wotulong commented 1 year ago

It has been solved by myself. We need set bandwith params and use torchaudio to save the audio.

fakerybakery commented 11 months ago

Hi @wotulong, would you mind providing an example of how you solved your issue? Thank you!

facebookresearch / encodec

Low quality of the reconstruct audio？ #66

❓ Questions