Closed Janne-byti closed 3 weeks ago
Hi, this is expected since the input frequency band is below 24k (corresponding to 48k sampling rate). Our model was trained with only high quality 48k audios, therefore it will have a tendency to fill the entire 24k frequency band.
@vietanh125 well,I used 48k of data for verification, but the results were not very good. The only missing part was the sound of electricity, which sounded uncomfortable. Is this the normal effect of the model?
Hi, I am also getting the poor quality audio like string kind of noise in concleamnet, even in the uploaded audio samples I can see the same kind of issue with FRN. Could you please update ?
Hi, one the limitations in our model is that it's not very robust to noisy environment since the training dataset (VCTK) is noise free. You can work around by applying a noise reduction model before FRN.
@vietanh125 well,I used 48k of data for verification, but the results were not very good. The only missing part was the sound of electricity, which sounded uncomfortable. Is this the normal effect of the model?
Hi, I am also getting the poor quality audio like string kind of noise in concleamnet, even in the uploaded audio samples I can see the same kind of issue with FRN. Could you please update ?
I have made a demo on Huggingface space, could you try your audio files on it and see if the output has any kind of artifacts?
Hi, one the limitations in our model is that it's not very robust to noisy environment since the training dataset (VCTK) is noise free. You can work around by applying a noise reduction model before FRN.
@vietanh125 well,I used 48k of data for verification, but the results were not very good. The only missing part was the sound of electricity, which sounded uncomfortable. Is this the normal effect of the model?
No, my test audio is also clean voice. The problem occurred when the restored voice at the packet loss location was electric current sound, which sounded abrupt. I looked at the spectrum and found that the restored voice at the packet loss location seemed to repeat the previous frame of data without prediction or speech synthesis
I can see there are some noises in high frequency bands. Since we didn't include this kind of data in the training, the model might have a bad performance on it. You can try including these data to see if it improves the result. Also, PLC is meant to mitigate the effects of packet loss, I don't think it's capable of predicting missing audio or speech synthesis, at least for a small like FRN.