AppleHolic / source_separation

Deep learning based speech source separation using Pytorch
Apache License 2.0
312 stars 45 forks source link

logstft vs. linear stft #9

Closed faroit closed 4 years ago

faroit commented 4 years ago

Hi, this is a great implementation of the complex unet. Congrats. I wonder if why you chose to use the logstft instead of the linear stft as done here. Did you observe better performance?

Just a small note, You have used MUSDB18 instead of DSD100 for singing voice. Its a bit larger. By the way, did you evaluate your results using museval?

Cheers Fabian

AppleHolic commented 4 years ago

@faroit

  1. Linear stft has too large numbers for training model in stably. When I made first model, I got faced gradient explosion on using linear stft, so I thought simply to solve them using log space.

  2. Not yet do that and thanks for awakening them. I will make a new model on MUSDB18 dataset and evaluate them with museval soon.

I will follow up second issue and notice the progress on this issue.

Thanks

faroit commented 4 years ago

Linear stft has too large numbers for training model in stably. When I made first model, I got faced gradient explosion on using linear stft, so I thought simply to solve them using log space.

I see. Have you considered using mean/std normalization instead/additionally?

AppleHolic commented 4 years ago

No, I didn't consider using mean/std normalization. That also seems like can help the result. When I will have next experiment, I additionally try that.

AppleHolic commented 4 years ago

On Testing, Simple comment

But, I got more better result on test data with audioset.

AppleHolic commented 4 years ago

Continues following issue on #16 and other, close it.