aleXiehta / PhoneFortifiedPerceptualLoss

Improving Perceptual Quality by Phone-Fortified Perceptual Loss using Wasserstein Distance for Speech Enhancement
MIT License
73 stars 17 forks source link

about model input #2

Open enhancer12 opened 3 years ago

enhancer12 commented 3 years ago

Hello Tsun-An, I have read your paper "IMPROVING PERCEPTUAL QUALITY BY PHONE-FORTIFIED PERCEPTUAL LOSS FOR SPEECH ENHANCEMENT", which was actually well written. And I have a question about your code in 'dataset.py' L41&L42: why do you use a constant, 16384, to constrain the input length? Or does this constant have any special meaning? Thank you~ https://github.com/aleXiehta/PhoneFortifiedPerceptualLoss/blob/d763760262ff4092b959fb8f9357f10812bbddac/dataset.py#L41 https://github.com/aleXiehta/PhoneFortifiedPerceptualLoss/blob/d763760262ff4092b959fb8f9357f10812bbddac/dataset.py#L42

aleXiehta commented 3 years ago

Hi, we are grateful to know that you are interested in our work! The input is truncated due to the limitation of VRAM because DCU-Net20 is quite large. To generate an output with an identical length as its input, we need the input to be 2^n, and therefore we choose the length of 16384, which is about 1 second long.