Crystalsound / FRN

Other
26 stars 6 forks source link

I tried the model in the code and found a lot of current noise. Is there any solution? #2

Closed Janne-byti closed 3 weeks ago

Janne-byti commented 1 year ago

image

vietanh125 commented 1 year ago

Hi, this is expected since the input frequency band is below 24k (corresponding to 48k sampling rate). Our model was trained with only high quality 48k audios, therefore it will have a tendency to fill the entire 24k frequency band.

Janne-byti commented 1 year ago

@vietanh125 well,I used 48k of data for verification, but the results were not very good. The only missing part was the sound of electricity, which sounded uncomfortable. Is this the normal effect of the model? image

Niteshkumarchaudhary commented 1 year ago

Hi, I am also getting the poor quality audio like string kind of noise in concleamnet, even in the uploaded audio samples I can see the same kind of issue with FRN. Could you please update ?

vietanh125 commented 1 year ago

Hi, one the limitations in our model is that it's not very robust to noisy environment since the training dataset (VCTK) is noise free. You can work around by applying a noise reduction model before FRN.

@vietanh125 well,I used 48k of data for verification, but the results were not very good. The only missing part was the sound of electricity, which sounded uncomfortable. Is this the normal effect of the model? image

vietanh125 commented 1 year ago

Hi, I am also getting the poor quality audio like string kind of noise in concleamnet, even in the uploaded audio samples I can see the same kind of issue with FRN. Could you please update ?

I have made a demo on Huggingface space, could you try your audio files on it and see if the output has any kind of artifacts?

Janne-byti commented 1 year ago

Hi, one the limitations in our model is that it's not very robust to noisy environment since the training dataset (VCTK) is noise free. You can work around by applying a noise reduction model before FRN.

@vietanh125 well,I used 48k of data for verification, but the results were not very good. The only missing part was the sound of electricity, which sounded uncomfortable. Is this the normal effect of the model? image

No, my test audio is also clean voice. The problem occurred when the restored voice at the packet loss location was electric current sound, which sounded abrupt. I looked at the spectrum and found that the restored voice at the packet loss location seemed to repeat the previous frame of data without prediction or speech synthesis

vietanh125 commented 1 year ago

I can see there are some noises in high frequency bands. Since we didn't include this kind of data in the training, the model might have a bad performance on it. You can try including these data to see if it improves the result. Also, PLC is meant to mitigate the effects of packet loss, I don't think it's capable of predicting missing audio or speech synthesis, at least for a small like FRN.