Did you reproduce the performance of the original paper?

echocatzh / MTFAA-Net

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MIT License

187 stars 55 forks source link

Did you reproduce the performance of the original paper? #2

Closed hbwu-ntu closed 2 years ago

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great！

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

Thank you for your reply! You metioned that the si-snr was used. Can you give some details about it? As shown in code: https://github.com/echocatzh/MTFAA-Net/blob/eb3b1f33d7c5178f238076938c99acaec9e2e904/mtfaa.py#L144

, the outputs of the model include magnitude spectrogram，complex frequency domain and time domain of the near end voice，which item was used in si-snr loss function.

Time domain：self.stft.inverse(real, imag)

Thanks again！ Can you give more infos about the training, such as, training data scale, traing machine, GPU or CPU, and how long it takes to train the model? Did you do some data preprocessng works before training, for example, preprocess the data and save intermediats features before training, then load the features during training to speed up the procedure? or just read the audio data directly from disk? Thank you.

https://github.com/echocatzh/MTFAA-Net/issues/2#issuecomment-1321776195 https://github.com/echocatzh/MTFAA-Net/issues/2#issuecomment-1318307363

shenbuguanni commented 10 months ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great！

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

ok，I have listened your result, it's pretty good! I also have a 10 meters far field double talk scene, could you please help me process it? thanks! 链接: https://pan.baidu.com/s/1jlEDSSim55zJcpBIC4FLZg?pwd=hqec 提取码: hqec

链接: https://pan.baidu.com/s/1GOEtltjIikNyWeHPZ0DxRw 提取码: ztpg 这是我的模型(只做aec，不降噪)处理后的结果，可以看下。

shenbuguanni commented 10 months ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great！

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

你好，这个文件里面的3条音频应该是原始mic+aec线性层输出+nn输出吧？请问能否补充一下原始ref信号，这样我可以跑下我的模型效果，对比一下

FragrantRookie commented 10 months ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great！

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

你好，这个文件里面的3条音频应该是原始mic+aec线性层输出+nn输出吧？请问能否补充一下原始ref信号，这样我可以跑下我的模型效果，对比一下

抱歉，时间有点久远，我找不到文件了。

leizhu1989 commented 10 months ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great！

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

你好，这个文件里面的3条音频应该是原始mic+aec线性层输出+nn输出吧？请问能否补充一下原始ref信号，这样我可以跑下我的模型效果，对比一下

抱歉，时间有点久远，我找不到文件了。

你好，大佬，请问你使用的LAEC方法是那种呢，论文上说的方法没看到开源啊，webrtc的可以吗，求指教

zzzzzzxm commented 10 months ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great！

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

Sorry to bother you, I'm new to the audio process domain, I have a question that when use torchaudio.load to load the .wav file of DNS dataset or AEC challenge dataset, the shape of audio is like "torch.Size([1, 159999])" (looks like channel, audio_length), but in the code, when test the net, the author produce a sample "inp = th.randn(3, 48000)" (with an anno "sigs: list [B N] of len(sigs)"), you mentioned that you used two channels, I don't very understand the meaning of different format of the input, can you give me some suggestions? Anyway, thanks for your answers before. : )