andabi / music-source-separation

Deep neural networks for separating singing voice from music written in TensorFlow
796 stars 149 forks source link

Loss Function Question #17

Closed leimao closed 6 years ago

leimao commented 6 years ago

Hello andabi,

In the project, we used y_tilde_src1 and y_tilde_src2 as our prediction where y_tilde_src1 = y_hat_src1 / (y_hat_src1 + y_hat_src2 + np.finfo(float).eps) self.x_mixed y_tilde_src2 = y_hat_src2 / (y_hat_src1 + y_hat_src2 + np.finfo(float).eps) self.x_mixed we also used tf.reduce_mean(tf.square(self.y_src1 - pred_y_src1) + tf.square(self.y_src2 - pred_y_src2) as our loss.

However, I found after sound data preprocessing, self.x_mixed != self.y_src1 + self.y_src2 (the left side and the right side usually differ a lot) which means that even a perfect model will not result in a loss of zero. This is probably the reason why during training my training loss will never go below 3 or 4 using this mask.

Could you please explain why this will work in your case? Thank you very much.

Best regards,

Lei