breizhn / DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
MIT License
586 stars 161 forks source link

ValueError: trying to load 500h file h5 with 10 layers but model architecture has 9 layers #55

Open nikokks opened 2 years ago

nikokks commented 2 years ago

Hi to all,

I think there is a problem in the class definition of the model for 2 reasons. First, i can load the classical model.h5 but not the DTLN_norm_500h.h5 when I want to use the 500h model I have the following error by running this command :

python3.8 DTLN/run_evaluation_file.py -i input_folder -o output_folder -m DTLN/pretrained_model/DTLN_norm_40h.h5

or python3.8 DTLN/run_evaluation_file.py -i input_folder -o output_folder -m DTLN/pretrained_model/DTLN_norm_500h.h5

but no problem with classical model python3.8 DTLN/run_evaluation_file.py -i input_folder -o output_folder -m DTLN/pretrained_model/model.h5

Traceback (most recent call last):
  File "DTLN/run_evaluation_file.py", line 78, in <module>
    modelClass.model.load_weights(args.model)
  File "/home/nikkokks/.local/lib/python3.8/site-packages/keras/engine/training.py", line 2361, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "/home/nikkokks/.local/lib/python3.8/site-packages/keras/saving/hdf5_format.py", line 688, in load_weights_from_hdf5_group
    raise ValueError('You are trying to load a weight file '
ValueError: You are trying to load a weight file containing 10 layers into a model with 9 layers.

Secondly, when I use the classical model It runs perfectly but the shapes of the denoised soundile is truncated

I think you may missed the definition of the last layer of the model in the class DTLN_model in the file DTLN_model.py but I do not know where.

Can you help me ?

I listened to the denoised soundfile and it seems to be better than lots of models in this field so congrats !

StuartIanNaylor commented 2 years ago

I just ran

python run_evaluation.py -i /media/stuart/843583c3-c84c-4ea6-974d-6acd38788ef6/home/stuart/DNS-Challenge/training_set/val/noisy -o ./processed -m ./pretrained_model/DTLN_norm_500h.h5

Thought I would give it a go as sat here scratching my head on how to create a state-full model for a custom real-time tflite

whats run_evaluation_file.py ?

None
2022-04-25 19:40:43.523463: I tensorflow/stream_executor/cuda/cuda_dnn.cc:366] Loaded cuDNN version 8400
2022-04-25 19:40:43.922703: I tensorflow/stream_executor/cuda/cuda_blas.cc:1774] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
fileid_861.wav processed successfully!
fileid_1399.wav processed successfully!
fileid_4514.wav processed successfully!
fileid_335.wav processed successfully!
fileid_1592.wav processed successfully!
fileid_2779.wav processed successfully!
fileid_4027.wav processed successfully!
fileid_2084.wav processed successfully!
fileid_2941.wav processed successfully!
fileid_2342.wav processed successfully!
fileid_2260.wav processed successfully!

tf2.7.1

nikokks commented 2 years ago

Oh thanks it works for me now, I've modified something in the code.

The second problem is when I want to evaluate the model with the metric STOI, I don't have the same length of data between the processed soundfile and the not clear voice. For others models I can do it but specifically for the DTLN model it can't do it. Is it normal to dont have the same length of data after processing with the model ?

breizhn commented 2 years ago

Hi Nicolas, the length difference is a result from the windowing. I will add a fix for that. The processed files are probably a bit shorter. A quick fix for you would just shortening the reference file to the length of the processed file.

I will also look into the other error.

breizhn commented 2 years ago

The length issue should be fixed now.

nikokks commented 2 years ago

Yes it is ! thanks a lot !