echocatzh / MTFAA-Net

Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement
MIT License
194 stars 56 forks source link

question about the network #3

Closed wendongj closed 2 years ago

wendongj commented 2 years ago

thanks for your code, there is a problem still confuse me, the input of the u-net structure is the magnitude after the phase encoder, but the output of the u-net have two-stage mask, one is magnitude mask, the other is phase mask and magnitude mask, I am confusing that there is no phase information input to the u-net structure, how can it get the correct phase mask? or after phase encoder, although the output is magnitude, but it includes phase information?

echocatzh commented 2 years ago

In my opinion, there is no problem with this approach in theory. Everyone understands the paper differently, and it is normal to have different views on this part. In addition, you can compare the two stages to see if the loss of stage2 actually work.