LXP-Never / TCNN

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain
45 stars 6 forks source link

Overfitting Problem #1

Closed HardeyPandya closed 2 years ago

HardeyPandya commented 2 years ago

Hello. Thanks for this model architecture code. I am just very beginner in this area and not much familiar with Deep Learnning standard procedures and all.

Actually, I have converted this code - https://github.com/haoxiangsnr/IRM-based-Speech-Enhancement-using-LSTM into time domain and then used your model architecture code for this. I am using google Colab platform for this.

I have used TIMIT dataset which is having 8732 utterances and randomly mixed with UrbanSound8K noises at -5dB,-4dB,-3dB,-2dB,-1dB,0dB and 1dB, so I am having 8732 noisy speeches. Then I convert it into overlapping frames. And output proceeds.

But I am not sure whether it is overfitting - After 600 epochs of training validation, average PESQ score obtained is 2.13. Average PESQ between clean speech and noisy speech of UrbanSound8k + TIMIT clean speech is around 1.8 . (On each epoch 900 noisy utterences are trained and on next epoch utterences are shuffled and it is trained on other 900 utterences)

But upon testing, I use NOIZEUS database which is unseen to the TCNN network. I am getting very low PESQ score of 1.42 after loading checkpoints. Also when i run same inference script I get different PESQ scores like 1.3, 1.4 or 1.5 something like that ! On same model checkpoint.

Any suggestions why this is the case ? It would be very helpful.

Thanks.

LXP-Never commented 2 years ago

There are many reasons for overfitting: 1、The model complexity is too high and there are too many parameters 2、The training data set is relatively small solution: 1、Reduce model complexity 2、More diverse data or data augmentation 3、dropout 4、Regularization 5、Stop early 6、BN

HardeyPandya commented 2 years ago

Ok thanks, but initially when training (let's say my data is too small , only 8732 utterances) , average validation PESQ score is coming less than 1 in initial epochs. Is it possible to have PESQ score less than 1.4 ? I heard it should be between 1.4 and 4.5 . So I suspect where is the problem.

I will surely try by training different dataset.

HardeyPandya commented 2 years ago

I will also try by reducing number of encoder decoder layers.

Also I am not sure - everytime I run the inference script loading same checkpoint and use same noisy speech I am getting different PESQ and STOI scores between predicted clean speech through trained model and clean speech. And difference is not small, it is relatively larger. Like on first run, it comes 1.49 , second run it comes 1.31 , then sometimes 1.1 and this PESQ score is even lower than the PESQ score obtained through noisy and clean speech. Why this would happen?

LXP-Never commented 2 years ago

PESQ is really too low, maybe you should check your training pipeline for problems? I guess

HardeyPandya commented 2 years ago

Yes I have checked pipeline. Converting 16kHz resampled audio speeches to overlapping frames etc, then calculating loss function and convert it back to original audio signal using Overlap and Add etc it is working properly. Maybe I am missing something it is possible. I will put colab link code soon on GitHub.

To me it seems overfitting problem to be honest and using more complex model for smaller data. But I have no knowledge of common errors and standard procedures. So that's why I am asking this question. It is very low PESQ score it is okay - but is it possible to fluctuate PESQ and STOI scores this much upon every run of inference script or is it some mistake?

Thank You for your suggestions and comments.

HardeyPandya commented 2 years ago

This is the code I prepared for TCNN by gathering some code segments from other sources as mentioned.

https://github.com/HardeyPandya/Temporal-Convolutional-Neural-Network-Single-Channel-Speech-Enhancement

LXP-Never commented 2 years ago

👍