felixfuyihui / Uformer

Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation
94 stars 16 forks source link

Input error when running uformer.py #3

Open jvel07 opened 2 years ago

jvel07 commented 2 years ago

The paper is one of the best out there, congrats! I am trying to run uformer.py but I get the following error:

RuntimeError: Given normalized_shape=[12], expected input with shape [*, 12], but got input of size[10, 64, 2, 749, 5]

Am I missing something here?

felixfuyihui commented 2 years ago

The paper is one of the best out there, congrats! I am trying to run uformer.py but I get the following error:

RuntimeError: Given normalized_shape=[12], expected input with shape [*, 12], but got input of size[10, 64, 2, 749, 5]

Am I missing something here?

Thank you so much for your attention. Would you please kindly tell me which line of code outputs this error? And would you please copy the full error report?

jvel07 commented 2 years ago

Sure, here is the full output:

Traceback (most recent call last):
  File "/home/jvel/jupyterNotebooks/samsung/Uformer/uformer.py", line 451, in <module>
    outputs = net(inputs,inputs)
  File "/home/jvel/anaconda3/envs/asteroid/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jvel/jupyterNotebooks/samsung/Uformer/uformer.py", line 336, in forward
    out, mag = self.conformer1(out, mag)
  File "/home/jvel/anaconda3/envs/asteroid/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jvel/jupyterNotebooks/samsung/Uformer/dilated_dualpath_conformer.py", line 118, in forward
    cplx = self.dsconv_cplx[idx](cplx)
  File "/home/jvel/anaconda3/envs/asteroid/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jvel/jupyterNotebooks/samsung/Uformer/dsconv2d_cplx.py", line 48, in forward
    y = self.layernorm_conv1(x.transpose(2,4)).transpose(2,4)
  File "/home/jvel/anaconda3/envs/asteroid/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jvel/anaconda3/envs/asteroid/lib/python3.8/site-packages/torch/nn/modules/normalization.py", line 170, in forward
    return F.layer_norm(
  File "/home/jvel/anaconda3/envs/asteroid/lib/python3.8/site-packages/torch/nn/functional.py", line 2202, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: Given normalized_shape=[12], expected input with shape [*, 12], but got input of size[10, 64, 2, 749, 5]

Process finished with exit code 1
jvel07 commented 2 years ago

By the way, I am just executing uformer.py as is (I just want to try it first). The input I am using has not changed from the original code:

net = Uformer()
inputs = torch.randn([10,1,64000*3])
outputs = net(inputs,inputs)

Is there something wrong with the input perhaps?

felixfuyihui commented 2 years ago

By the way, I am just executing uformer.py as is (I just want to try it first). The input I am using has not changed from the original code:

net = Uformer()
inputs = torch.randn([10,1,64000*3])
outputs = net(inputs,inputs)

Is there something wrong with the input perhaps?

By the way, I am just executing uformer.py as is (I just want to try it first). The input I am using has not changed from the original code:

net = Uformer()
inputs = torch.randn([10,1,64000*3])
outputs = net(inputs,inputs)

Is there something wrong with the input perhaps?

Yes please wait a moment. I'm checking. Your finding is really valuable.

felixfuyihui commented 2 years ago

By the way, I am just executing uformer.py as is (I just want to try it first). The input I am using has not changed from the original code:

net = Uformer()
inputs = torch.randn([10,1,64000*3])
outputs = net(inputs,inputs)

Is there something wrong with the input perhaps?

Thank you so much for your significant findings. I found there are some version control problems in my code. I've already made some major modifications to my code, so please download it again. Since I rewrite some codes in a limited time, so do not hesitate to contact me if you find out any error or have any confusion. BTW, I changed some of the ideas of my code which are a little bit different from the paper since I found out that these changes may be better.

jvel07 commented 2 years ago

Thanks so much for your answer. You are one of a kind person, thanks a lot for your time and your work!

I cloned the repo and seems these are missing: linear_real and linear_cplx which are being called from f_att_cplx.py. Also missing: ff_real and ff_cplx which are being called from dilated_dualpath_conformer.py.

BTW, I see your commit message "modify from 48k model to 16k model", is this parameter specifically set somewhere? If I need 12k, where do you set this?

felixfuyihui commented 2 years ago

Thanks so much for your answer. You are one of a kind person, thanks a lot for your time and your work!

I cloned the repo and seems these are missing: linear_real and linear_cplx which are being called from f_att_cplx.py. Also missing: ff_real and ff_cplx which are being called from dilated_dualpath_conformer.py.

BTW, I see your commit message "modify from 48k model to 16k model", is this parameter specifically set somewhere? If I need 12k, where do you set this?

Thank you. I've updated these missed codes. Please also kindly inform me if you find any other error. As for the point of sampling rate, I once changed the idea of my paper (16k) into 48k model. Since there are more sub-bands that need to be processed, I added more sub-band processing modules and also changed some parameters of the model (that's why you can not implement my code at first). For 12k wav, I personally think there is no big gap compared with 16k model. So maybe you can use the parameter in the paper. But don't forget to change the win_len and win_inc in the uformer.py if you still want to implement 25/10ms STFT.

jvel07 commented 2 years ago

Thank you, it seems to be working now! Will keep exploring the repo and let you know any findings!

BTW, I couldn't spot a main.py or a train.py in order to reproduce your paper as is (although I will probably use a different corpus). Is there any script to run and use for training?

felixfuyihui commented 2 years ago

Thank you, it seems to be working now! Will keep exploring the repo and let you know any findings!

BTW, I couldn't spot a main.py or a train.py in order to reproduce your paper as is (although I will probably use a different corpus). Is there any script to run and use for training?

Good news! Thank you again for your kind reminder. As for training scripts, I just consider that everyone has his/her own method to generate data and deliver training, so I didn't upload the training scripts. Maybe you can take https://github.com/kaituoxu/Conv-TasNet as an example for training. BTW, do you have SE data generating function? I strongly recommend https://github.com/microsoft/DNS-Challenge/blob/master/audiolib.py for data generating.

jvel07 commented 2 years ago

Thanks for your hints, will definitely take a look into them! Thanks again for your time and kind responses, much appreciated. Will let you know if I have any observations. Keep up the good work.