echocatzh / MTFAA-Net

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement
MIT License
187 stars 55 forks source link

Did you reproduce the performance of the original paper? #2

Closed hbwu-ntu closed 2 years ago

echocatzh commented 2 years ago

Already discussed.😃

YangangCao commented 2 years ago

Already discussed.😃

thanks for your great work, how about your training result?

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

YangangCao commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

thanks!

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

YangangCao commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

YangangCao commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

ok,I have listened your result, it's pretty good! I also have a 10 meters far field double talk scene, could you please help me process it? thanks! 链接: https://pan.baidu.com/s/1jlEDSSim55zJcpBIC4FLZg?pwd=hqec 提取码: hqec

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

ok,I have listened your result, it's pretty good! I also have a 10 meters far field double talk scene, could you please help me process it? thanks! 链接: https://pan.baidu.com/s/1jlEDSSim55zJcpBIC4FLZg?pwd=hqec 提取码: hqec

OK, I'll run it when it's convenient. I'm still busy these days. However, I didn't do the 10m test. I only considered the situation within 5m when designing.

xushaojun1975 commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

ok,I have listened your result, it's pretty good! I also have a 10 meters far field double talk scene, could you please help me process it? thanks! 链接: https://pan.baidu.com/s/1jlEDSSim55zJcpBIC4FLZg?pwd=hqec 提取码: hqec

OK, I'll run it when it's convenient. I'm still busy these days. However, I didn't do the 10m test. I only considered the situation within 5m when designing.

The great job! Where I can find the code the author published in the open source community? Thanks.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

ok,I have listened your result, it's pretty good! I also have a 10 meters far field double talk scene, could you please help me process it? thanks! 链接: https://pan.baidu.com/s/1jlEDSSim55zJcpBIC4FLZg?pwd=hqec 提取码: hqec

OK, I'll run it when it's convenient. I'm still busy these days. However, I didn't do the 10m test. I only considered the situation within 5m when designing.

The great job! Where I can find the code the author published in the open source community? Thanks.

The current address is the address of the code.

cxwang822 commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

YangangCao commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

when I use MTFAA to AEC and denoise(no dereverb, three channel inputs like paper), the si-snr don't decrease(just drifting), if I only denoise(one channel input), it decrease.I dont know why... can you please tell me how to make si-snr decrease in AEC and denoise, thanks very much!

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

when I use MTFAA to AEC and denoise(no dereverb, three channel inputs like paper), the si-snr don't decrease(just drifting), if I only denoise(one channel input), it decrease.I dont know why... can you please tell me how to make si-snr decrease in AEC and denoise, thanks very much!

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

YangangCao commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

FragrantRookie commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

YangangCao commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

FragrantRookie commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

I would like to ask whether you do research in school or products in the company. If you are doing research in school, maybe rir can be satisfied. But to make products, you need to use real recordings. Write a code to record real recordings.

YangangCao commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

I would like to ask whether you do research in school or products in the company. If you are doing research in school, maybe rir can be satisfied. But to make products, you need to use real recordings. Write a code to record real recordings.

thanks for tips, I work in a company. As your words, I want to put a loudspeaker to play clean audio and use microphone to record it. how do you think this idea? it there need to change the loudspeaker to a real person?

FragrantRookie commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

I would like to ask whether you do research in school or products in the company. If you are doing research in school, maybe rir can be satisfied. But to make products, you need to use real recordings. Write a code to record real recordings.

thanks for tips, I work in a company. As your words, I want to put a loudspeaker to play clean audio and use microphone to record it. how do you think this idea? it there need to change the loudspeaker to a real person?

Recording clean audio played by loudspeaker.

FragrantRookie commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

I would like to ask whether you do research in school or products in the company. If you are doing research in school, maybe rir can be satisfied. But to make products, you need to use real recordings. Write a code to record real recordings.

thanks for tips, I work in a company. As your words, I want to put a loudspeaker to play clean audio and use microphone to record it. how do you think this idea? it there need to change the loudspeaker to a real person?

Only the far-end signal needs to be recorded, and the near-end signal is not needed. Using the far-end signals recorded and the standard corpus to synthetise double talk corpus.

YangangCao commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

I would like to ask whether you do research in school or products in the company. If you are doing research in school, maybe rir can be satisfied. But to make products, you need to use real recordings. Write a code to record real recordings.

thanks for tips, I work in a company. As your words, I want to put a loudspeaker to play clean audio and use microphone to record it. how do you think this idea? it there need to change the loudspeaker to a real person?

Only the far-end signal needs to be recorded, and the near-end signal is not needed. Using the far-end signals recorded and the standard corpus to synthetise double talk corpus.

But near-end denoising should be considered, I think I should record near-end signal too. how do you think?

FragrantRookie commented 1 year ago

I am not sure exactly what the reason was. For more information, consult the author of the code. If we can donate R&D expenses, I believe the author will have more energy to provide great open source code.

I have found the reason, it's all zeros farend single talk target. it will make sisnr drift

You're close to success.

MTFAA (both L1 loss and sisnr loss) works well in simulated test dataset, but bad in real world test dataset, maybe there is quite difference between two dataset. Can you please tell me which RIR generator you use? ,I use gpuRIR and fastrir, thanks very much.

I would like to ask whether you do research in school or products in the company. If you are doing research in school, maybe rir can be satisfied. But to make products, you need to use real recordings. Write a code to record real recordings.

thanks for tips, I work in a company. As your words, I want to put a loudspeaker to play clean audio and use microphone to record it. how do you think this idea? it there need to change the loudspeaker to a real person?

Only the far-end signal needs to be recorded, and the near-end signal is not needed. Using the far-end signals recorded and the standard corpus to synthetise double talk corpus.

But near-end denoising should be considered, I think I should record near-end signal too. how do you think?

The current discussion is not in this code category. It is not appropriate to continue the discussion here.In short, there is no problem with this code. The code is very good.

cxwang822 commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I would like to know how long it takes you to train an epoch, it takes me three hours to train an epoch with a training set of 300 hours, which is very slow

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I would like to know how long it takes you to train an epoch, it takes me three hours to train an epoch with a training set of 300 hours, which is very slow

Mine is the same as yours.

taucontrib commented 1 year ago

Hello, any chance of sharing the audio examples on a different website? A chinese mobile phone number is required for a baidu login wich I sadly dont have. Would be very curious to listen. Thank you!

YangangCao commented 1 year ago

Hello, any chance of sharing the audio examples on a different website? A chinese mobile phone number is required for a baidu login wich I sadly dont have. Would be very curious to listen. Thank you!

ok, you can leave your email, I will send

taucontrib commented 1 year ago

derserafin (at) gmail (dot) com

Thanks a lot!

FragrantRookie commented 1 year ago

Hello, any chance of sharing the audio examples on a different website? A chinese mobile phone number is required for a baidu login wich I sadly dont have. Would be very curious to listen. Thank you!

https://drive.google.com/drive/folders/1Zlp7krFOvrEmQ_QvKd7a0CLpxs1Sp1Q7?usp=sharing

KarmaYan commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

I just use 16khz dns datas. If your target scene is far field, you need to set the volume of near-end voice a little lower. I use three-channel inputs as described in the paper. ERB will make the signal amplitude weaker.

KarmaYan commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

I just use 16khz dns datas. If your target scene is far field, you need to set the volume of near-end voice a little lower. I use three-channel inputs as described in the paper. ERB will make the signal amplitude weaker.

Thank you for your reply. I would like to ask again, what kind of processing is required to put 16khz data into that 48k sample rate network? This is my first time reproducing a 48k network and I am not too familiar with it.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

I just use 16khz dns datas. If your target scene is far field, you need to set the volume of near-end voice a little lower. I use three-channel inputs as described in the paper. ERB will make the signal amplitude weaker.

Thank you for your reply. I would like to ask again, what kind of processing is required to put 16khz data into that 48k sample rate network? This is my first time reproducing a 48k network and I am not too familiar with it.

Just change the sample rate in the init of mftaa.py to 16000

KarmaYan commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

I just use 16khz dns datas. If your target scene is far field, you need to set the volume of near-end voice a little lower. I use three-channel inputs as described in the paper. ERB will make the signal amplitude weaker.

Thank you for your reply. I would like to ask again, what kind of processing is required to put 16khz data into that 48k sample rate network? This is my first time reproducing a 48k network and I am not too familiar with it.

Just change the sample rate in the init of mftaa.py to 16000

Oh oh oh ok, I am also training on the 16k network that I changed MTFAA to, and removing the erb domain (to avoid the effect of amplitude reduction and information loss). pesq score is around 3.1+ on my own test set, while FRCRN can test PESQ up to 3.5+ on the same dataset, so I am not satisfied with my reproduced MTFAA performance is not satisfactory. Finally I would like to ask if you have tested MTFAA on objective metrics? What data set did you use? How many scores were you able to achieve?

FragrantRookie commented 1 year ago

Oh oh oh ok, I am also training on the 16k network that I changed MTFAA to, and removing the erb domain (to avoid the effect of amplitude reduction and information loss). pesq score is around 3.1+ on my own test set, while FRCRN can test PESQ up to 3.5+ on the same dataset, so I am not satisfied with my reproduced MTFAA performance is not satisfactory. Finally I would like to ask if you have tested MTFAA on objective metrics? What data set did you use? How many scores were you able to achieve?

I didn't compare the two networks. Because I work in the company, my goal is to put algorithms into products. Therefore, the dataset I use is set for the use scenario of the product.If you know why the pesq of mtfaa is not as good as that of frcrn, please tell me.

KarmaYan commented 1 year ago

Oh oh oh ok, I am also training on the 16k network that I changed MTFAA to, and removing the erb domain (to avoid the effect of amplitude reduction and information loss). pesq score is around 3.1+ on my own test set, while FRCRN can test PESQ up to 3.5+ on the same dataset, so I am not satisfied with my reproduced MTFAA performance is not satisfactory. Finally I would like to ask if you have tested MTFAA on objective metrics? What data set did you use? How many scores were you able to achieve?

I didn't compare the two networks. Because I work in the company, my goal is to put algorithms into products. Therefore, the dataset I use is set for the use scenario of the product.If you know why the pesq of mtfaa is not as good as that of frcrn, please tell me.

OK, I will do my best to optimize my reproduction project with a view to achieve better MTFAA results. Thanks for your patience to answer!

FragrantRookie commented 1 year ago

Oh oh oh ok, I am also training on the 16k network that I changed MTFAA to, and removing the erb domain (to avoid the effect of amplitude reduction and information loss). pesq score is around 3.1+ on my own test set, while FRCRN can test PESQ up to 3.5+ on the same dataset, so I am not satisfied with my reproduced MTFAA performance is not satisfactory. Finally I would like to ask if you have tested MTFAA on objective metrics? What data set did you use? How many scores were you able to achieve?

I didn't compare the two networks. Because I work in the company, my goal is to put algorithms into products. Therefore, the dataset I use is set for the use scenario of the product.If you know why the pesq of mtfaa is not as good as that of frcrn, please tell me.

OK, I will do my best to optimize my reproduction project with a view to achieve better MTFAA results. Thanks for your patience to answer!

Because the problem I deal with is echo cancellation and noise reduction, I use three channels. If only noise reduction, one channel is enough.

jet-yangqs commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

The model params is 2M+ and it may have efficiency problems. Do you mind to explain how to use it in real project? is it need to rewrite the model net in C++ language and load the pretrained model for real time applications?

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

The model params is 2M+ and it may have efficiency problems. Do you mind to explain how to use it in real project? is it need to rewrite the model net in C++ language and load the pretrained model for real time applications?

My method is to convert pytorch to tensorflow, and then deploy using tensorflow.

jet-yangqs commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

The model params is 2M+ and it may have efficiency problems. Do you mind to explain how to use it in real project? is it need to rewrite the model net in C++ language and load the pretrained model for real time applications?

My method is to convert pytorch to tensorflow, and then deploy using tensorflow.

Do you mean you rewrite and train the model in tenserflow framework? or convert and deploy the trained model after training with pytorch?

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

The model params is 2M+ and it may have efficiency problems. Do you mind to explain how to use it in real project? is it need to rewrite the model net in C++ language and load the pretrained model for real time applications?

My method is to convert pytorch to tensorflow, and then deploy using tensorflow.

Do you mean you rewrite and train the model in tenserflow framework? or convert and deploy the trained model after training with pytorch?

Rewrite the model in tensorflow framework.convert and deploy the trained model after training with pytorch.Than load the weights from pytorch to tensorflow.

jet-yangqs commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

Thank you for your reply! You metioned that the si-snr was used. Can you give some details about it? As shown in code: https://github.com/echocatzh/MTFAA-Net/blob/eb3b1f33d7c5178f238076938c99acaec9e2e904/mtfaa.py#L144 , the outputs of the model include magnitude spectrogram,complex frequency domain and time domain of the near end voice,which item was used in si-snr loss function.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

Thank you for your reply! You metioned that the si-snr was used. Can you give some details about it? As shown in code:

https://github.com/echocatzh/MTFAA-Net/blob/eb3b1f33d7c5178f238076938c99acaec9e2e904/mtfaa.py#L144

, the outputs of the model include magnitude spectrogram,complex frequency domain and time domain of the near end voice,which item was used in si-snr loss function.

Time domain:self.stft.inverse(real, imag)

KarmaYan commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

I just use 16khz dns datas. If your target scene is far field, you need to set the volume of near-end voice a little lower. I use three-channel inputs as described in the paper. ERB will make the signal amplitude weaker.

Sorry to bother you again. I did a comparison experiment with the same training process and network structure, once with ERB domain conversion and inverse conversion, and once without ERB module, doing computation only on STFT domain. Is it normal that the model obtained from the previous training session will have significant speech distortion? I see that many papers that do multi-stage learning now also mention that domain conversion may cause information loss.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I have recently been working on a replication of this project as well, but the test results show that it has some degree of enhancement but not as good as the contest results. So I would like to ask you some related questions and I wonder if it is convenient for you to tell me. Did you use the official 48khz DNS data in the reproduction process, and what kind of processing was done to the data? Was your experiment a three-channel or one-channel input? I found that the ERB module may make the signal amplitude significantly weaker during the reproduction, resulting in unsatisfactory enhancement results, and I wonder if you have encountered similar problems.

I just use 16khz dns datas. If your target scene is far field, you need to set the volume of near-end voice a little lower. I use three-channel inputs as described in the paper. ERB will make the signal amplitude weaker.

Sorry to bother you again. I did a comparison experiment with the same training process and network structure, once with ERB domain conversion and inverse conversion, and once without ERB module, doing computation only on STFT domain. Is it normal that the model obtained from the previous training session will have significant speech distortion? I see that many papers that do multi-stage learning now also mention that domain conversion may cause information loss.

My result after using erb is the same as yours.I can't answer the question whether the signal should be processed in time domain or frequency domain. You can compare the effects of dtln and encode-decode architectures. Dtln is what you call a two-stage approach.

shawnyxf commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

Hi, may i know your RTF of your small size model deployed on cortex-a, thanks in advance.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

Hi, may i know your RTF of your small size model deployed on cortex-a, thanks in advance.

Sorry, I don't understand what RTF is.

jet-yangqs commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, could you please tell me what loss function you use? Thanks very much! :)

Just si-snr for aec and denoise.Maybe other loss function will be better. I will try other loss functions when I have time.

Thank you for your reply! You metioned that the si-snr was used. Can you give some details about it? As shown in code: https://github.com/echocatzh/MTFAA-Net/blob/eb3b1f33d7c5178f238076938c99acaec9e2e904/mtfaa.py#L144

, the outputs of the model include magnitude spectrogram,complex frequency domain and time domain of the near end voice,which item was used in si-snr loss function.

Time domain:self.stft.inverse(real, imag)

Thanks again! Can you give more infos about the training, such as, training data scale, traing machine, GPU or CPU, and how long it takes to train the model? Did you do some data preprocessng works before training, for example, preprocess the data and save intermediats features before training, then load the features during training to speed up the procedure? or just read the audio data directly from disk? Thank you.

FragrantRookie commented 1 year ago

Already discussed.😃

thanks for your great work, how about your training result?

I did the test and it worked great!

Hi, I did some test but could not reproduce the result, would you mind sharing some code(such as loss function) to me, thanks very much, and my email is: cao_yangang@163.com

It is inconvenient to provide codes because I am working in the company. You can supplement the training code from the open source community.The author has published the core code, and only needs to add a little code to make it work.

ok, I understand, do you mind processing a noisy audio with your pre-trained model? thanks! 链接: https://pan.baidu.com/s/1y7WiZMiGROGF29WtIB9gMQ?pwd=qmjk 提取码: qmjk

I am using these this codes for aec which needs two channels. Your wav file only have one channel.I can show you the aec effect in the following link.I have reduced the network to be a small size so that it can be deployed on arm cortex-a: https://pan.baidu.com/s/1w7q5HLZeNlrZBtsqBCucfA 提取码: bu23

Hi, may i know your RTF of your small size model deployed on cortex-a, thanks in advance.

0.038,with cpu 4.2ghz.