Why not use nearend speech s to be the target ?

adobe-research / MetaAF

Control adaptive filters with neural networks.

https://jmcasebeer.github.io/projects/metaaf

228 stars 38 forks source link

Why not use nearend speech s to be the target ? #18

Closed xiaoqi91 closed 11 months ago

xiaoqi91 commented 11 months ago

Thanks for sharing your excellent work! Could you please explain that why not use the clean or noisy nearend speech to be the network target ? And why use the mic signal d be the target can cover double talk scenes ? Thank you.

jmcasebeer commented 11 months ago

Thanks for the question.

In the original work, we used the raw microphone signals so that the training was unsupervised, and did not require oracle signals (like clean speech). If you're interested in code and models that use the clean speech for training, check out the other branch here.

xiaoqi91 commented 11 months ago

Thanks for the question.

In the original work, we used the raw microphone signals so that the training was unsupervised, and did not require oracle signals (like clean speech). If you're interested in code and models that use the clean speech for training, check out the other branch here.

Thanks for your quick reply. Looking forward to your new paper!