Closed wotulong closed 1 year ago
I implement the code below, the quality of reconstruct audio is much lower than ground truth.
librispeech_dummy = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") model = EncodecModel.from_pretrained("facebook/encodec_24khz") processor = AutoProcessor.from_pretrained("facebook/encodec_24khz") librispeech_dummy = librispeech_dummy.cast_column("audio", Audio(sampling_rate=processor.sampling_rate)) audio_sample = librispeech_dummy[0]["audio"]["array"] def save_audio(audio, save_path, sr=24000): audio *= 32767 / max(0.01, np.max(np.abs(audio))) scipy.io.wavfile.write(save_path, sr, audio.astype(np.int16)) inputs = processor(raw_audio=audio_sample, sampling_rate=processor.sampling_rate, return_tensors="pt") encoder_outputs = model.encode(inputs["input_values"], inputs["padding_mask"]) audio_values = model.decode(encoder_outputs.audio_codes, encoder_outputs.audio_scales, inputs["padding_mask"])[0] save_audio(audio_sample, "./audio_gt.wav") save_audio(torch.squeeze(audio_values, dim=1)[0].data.cpu().float().numpy(), "./audio_reconstruct.wav")
and the spec of audio
Is there some error in my code? anyone konw why? thks
It has been solved by myself. We need set bandwith params and use torchaudio to save the audio.
Hi @wotulong, would you mind providing an example of how you solved your issue? Thank you!
❓ Questions
I implement the code below, the quality of reconstruct audio is much lower than ground truth.
and the spec of audio
Is there some error in my code? anyone konw why? thks