How to solve 'nan' loss in training?

fjiang9 / NKF-AEC

Acoustic Echo Cancellation with Nerual Kalman Filtering

240 stars 60 forks source link

How to solve 'nan' loss in training? #4

Closed Chen1399 closed 2 years ago

Chen1399 commented 2 years ago

'w' will diverge during training, resulting in an INF situation. The loss will be 'nan' or a very big number. How to solve it? I try to add a 'Tanh' after W, which is useful.

fjiang9 commented 2 years ago

You can use shorter audio segments at the beginning of training. Other 'warm up' training skills may also be good solutions to the 'nan' issue.

Chen1399 commented 2 years ago

Thanks. I solved the problem.

fjiang9 commented 2 years ago

Glad to hear that. Thanks for your interest!

meadow163 commented 2 years ago

Thanks. I solved the problem.

could you tell me how you solve this issue please?

Chen1399 commented 2 years ago

Adjust learning rate and training file length. And the init weight is important.

TeaPoly commented 1 year ago

Adjust learning rate and training file length. And the init weight is important.

Is there any recommended way to initialize the weights?

Chen1399 commented 1 year ago

Is there any recommended way to initialize the weights?

Initialize the weights by the way of 'autodsp'. And use the very short wav to train is also the way.

TeaPoly commented 1 year ago

Is there any recommended way to initialize the weights?

Initialize the weights by the way of 'autodsp'. And use the very short wav to train is also the way.

Thanks for you reply.

shenbuguanni commented 1 year ago

I find the echo_hat is very easy clip, then loss will be nan , "very short wav " is mean 1s? or other?

shenbuguanni commented 1 year ago

please ask the ‘self.kg_net.init_hidden’ is need to use in trainning stage? what about you say 'Initialize the weights', is mean GRU or Dense?

Chen1399 commented 1 year ago

I find the echo_hat is very easy clip, then loss will be nan , "very short wav " is mean 1s? or other?

Maybe shorter, which made the loss isn't nan.

Chen1399 commented 1 year ago

please ask the ‘self.kg_net.init_hidden’ is need to use in trainning stage? what about you say 'Initialize the weights', is mean GRU or Dense?

‘self.kg_net.init_hidden' is the state of gru. Using the shorter wav to train is a way to have a greater init-weights.

shenbuguanni commented 1 year ago

I don't understand why shorter (maybe 0.5s) audio can avoid Nan，what is your batch_size?

BruceWeiii commented 1 year ago

谢谢。我解决了这个问题。

Hello, have you reproduced the results of the paper? Could you describe your training process? I modified the test code for training and found that the results were poor. Thank u!