breizhn / DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
MIT License
567 stars 160 forks source link

Result of retrain is not good as pretrain models provided. #29

Open liziru opened 3 years ago

liziru commented 3 years ago

Thanks for your wonderful job. @breizhn
I use this project to retrain on DNS-challenge dataset that was updated recently. The denoised results of the retraining model is a little worse than that of your model provided in this project( both 40h and 500h model). I just set 'norm_stft'=True'. Any advice to improve the performance of retraining?

Looking forward to your reply.

guojunmin commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

liziru commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

Thanks for your reply. This training won't stop until the early-stopping callback patience is exhausted. And the final loss is close to -21.3. What about the training loss?

breizhn commented 3 years ago

Thanks for your wonderful job. @breizhn I use this project to retrain on DNS-challenge dataset that was updated recently. The denoised results of the retraining model is a little worse than that of your model provided in this project( both 40h and 500h model). I just set 'norm_stft'=True'. Any advice to improve the performance of retraining?

Looking forward to your reply.

Yes, I also realized that. Some of the additional data from ICASSP DNS-Challenge is very noisy (Exspecially the german data). The english speech of the first challenge has lot better quality. You could clean up the data by using the baseline model or one of the models provided here and discard all "clean" speech files with a bad SNR. Just process the clean files with a speech enhancement model (be sure that the model does not introduce any latency), subtract the enhanced speech from the input file. This gives you the residual noise. Compare the power of the residual noise and the enhanced speech. If the SNR is just 5 db, the file is probably not really a clean speech file.

guojunmin commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

Thanks for your reply. This training won't stop until the early-stopping callback patience is exhausted. And the final loss is close to -21.3. What about the training loss?

@liziru What do you mean training loss? I just pay attention to val_loss displayed during training. How is your environment settings? created with train_env.yml? I was surprised by your final loss around -21.3, because I have tried a lot of parameter combinations and the best result is just close to -16.9. By the way, my training data is from the repo forked by the owner. Did you try this data before?

LeeGyuHa commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

Thanks for your reply. This training won't stop until the early-stopping callback patience is exhausted. And the final loss is close to -21.3. What about the training loss?

@liziru How can you get -21.3 losses? My current parameters are, audio_length: 30 silence_length: 0.0 total_hours: 40 snr_lower: -10 snr_upper: 10 total_snrlevels: 5 Training Data: Validation Data = 7:3 Noise Type: 9 EA DTLN parameters are set as default parameters in git code.

Total Result : loss -11.7281, validation loss -11.0988...

Any advice would be appreciated....

liziru commented 3 years ago

Thanks for your wonderful job. @breizhn I use this project to retrain on DNS-challenge dataset that was updated recently. The denoised results of the retraining model is a little worse than that of your model provided in this project( both 40h and 500h model). I just set 'norm_stft'=True'. Any advice to improve the performance of retraining? Looking forward to your reply.

Yes, I also realized that. Some of the additional data from ICASSP DNS-Challenge is very noisy (Exspecially the german data). The english speech of the first challenge has lot better quality. You could clean up the data by using the baseline model or one of the models provided here and discard all "clean" speech files with a bad SNR. Just process the clean files with a speech enhancement model (be sure that the model does not introduce any latency), subtract the enhanced speech from the input file. This gives you the residual noise. Compare the power of the residual noise and the enhanced speech. If the SNR is just 5 db, the file is probably not really a clean speech file.

Thanks a lot. I will take a try.

liziru commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

Thanks for your reply. This training won't stop until the early-stopping callback patience is exhausted. And the final loss is close to -21.3. What about the training loss?

@liziru What do you mean training loss? I just pay attention to val_loss displayed during training. How is your environment settings? created with train_env.yml? I was surprised by your final loss around -21.3, because I have tried a lot of parameter combinations and the best result is just close to -16.9. By the way, my training data is from the repo forked by the owner. Did you try this data before?

I just follow the settings from repo and paper to train. And I found big data helps a lot. I did not use the training data referred to in the repo. Have you reviewed this paper?

liziru commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

Thanks for your reply. This training won't stop until the early-stopping callback patience is exhausted. And the final loss is close to -21.3. What about the training loss?

@liziru How can you get -21.3 losses? My current parameters are, audio_length: 30 silence_length: 0.0 total_hours: 40 snr_lower: -10 snr_upper: 10 total_snrlevels: 5 Training Data: Validation Data = 7:3 Noise Type: 9 EA DTLN parameters are set as default parameters in git code.

Total Result : loss -11.7281, validation loss -11.0988...

Any advice would be appreciated....

'total_snrlevels' should be 30. And you should review the paper for detailed training data settings.

LeeGyuHa commented 3 years ago

@liziru Have you finished all 200 epochs training? and what's the final val_loss?

Thanks for your reply. This training won't stop until the early-stopping callback patience is exhausted. And the final loss is close to -21.3. What about the training loss?

@liziru How can you get -21.3 losses? My current parameters are, audio_length: 30 silence_length: 0.0 total_hours: 40 snr_lower: -10 snr_upper: 10 total_snrlevels: 5 Training Data: Validation Data = 7:3 Noise Type: 9 EA DTLN parameters are set as default parameters in git code. Total Result : loss -11.7281, validation loss -11.0988... Any advice would be appreciated....

'total_snrlevels' should be 30. And you should review the paper for detailed training data settings.

Thank you. Thanks to this, Validation loss -16 came out. However, the performance is worse than model.h5 (the model in the code).

ghost commented 3 years ago

@liziru 您好,请问下你自己重新训练的模型大概是多少轮停的吖?我自己训练都是80多轮就停了,而且训练出来的模型参数大小是3990352,和预训练的参数量(norm 4003624 和 3989312)对不上啊

liziru commented 3 years ago

@liziru 您好,请问下你自己重新训练的模型大概是多少轮停的吖?我自己训练都是80多轮就停了,而且训练出来的模型参数大小是3990352,和预训练的参数量(norm 4003624 和 3989312)对不上啊

可以对比看一下降噪效果

ghost commented 3 years ago

@liziru 降噪效果从波形图上看比这预训练效果差些,参数大小就是模型最后训练完成保存的大小,训练数据也是按照-5 ~25 生成的

liziru commented 3 years ago

@liziru 降噪效果从波形图上看比这预训练效果差些,参数大小就是模型最后训练完成保存的大小,训练数据也是按照-5 ~25 生成的

训练数据不完全一样。我的结果也会差一点。

jeungmin717 commented 1 year ago

@liziru Hi, How can you get val_loss around -21? In my case I get train_loss 0.0011 val_loss:46 I didn't change a thing in this repository, or data configuration 500h (same as provided breizhn/dns-challenge) Can you give me some hint?