Closed xiaoqi91 closed 11 months ago
Thanks for the question.
In the original work, we used the raw microphone signals so that the training was unsupervised, and did not require oracle signals (like clean speech). If you're interested in code and models that use the clean speech for training, check out the other branch here.
Thanks for the question.
In the original work, we used the raw microphone signals so that the training was unsupervised, and did not require oracle signals (like clean speech). If you're interested in code and models that use the clean speech for training, check out the other branch here.
Thanks for your quick reply. Looking forward to your new paper!
Thanks for sharing your excellent work! Could you please explain that why not use the clean or noisy nearend speech to be the network target ? And why use the mic signal d be the target can cover double talk scenes ? Thank you.