Closed JohnHerry closed 9 months ago
The results should be reasonable unless there is something different in your data. The model is trained to predict quality of speech transmitted through voice calls so it might not work on an ASR dataset. The model is able to predict the quality of 16K, they will only be rated slightly lower than 48K. In general the overall MOS is more accurate than the dimension predictions so you could try to rely on the overall MOS only.
The results should be reasonable unless there is something different in your data. The model is trained to predict quality of speech transmitted through voice calls so it might not work on an ASR dataset. The model is able to predict the quality of 16K, they will only be rated slightly lower than 48K. In general the overall MOS is more accurate than the dimension predictions so you could try to rely on the overall MOS only.
Thansk for the help. But my result is not good, the noise lower sample may listens cleaner then the higher. Is it because I am testing on Mandarin dataset?
It's hard to say without having the data. Is it a public set? There were Mandarin samples in the training set - not that many, but that should not be the issue.
Hi, thanks for the job. I am searching for a tool to filter bad audio from ASR corpus to get TTS dataset. I had tried this one, and what I concern about is the noise_pred, and discontinuity_pred, I am using this tool on 16K audios so I ignored the col_pred. The test result is frustrating. I checked some samples and their scores, it seems no better then random. Is the model trained on 48K samples? should we train a 16K version?