fix_length_mode="trim" can lead to confusing loss/metrics

asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers

https://asteroid-team.github.io/

MIT License

2.23k stars 421 forks source link

fix_length_mode="trim" can lead to confusing loss/metrics #402

Open jonashaag opened 3 years ago

jonashaag commented 3 years ago

When using fix_length_mode="trim" in DCUNet, the signal is right-trimmed to the next possible size, and then zero-padded on the right to reconstruct the original size. When using a loss or metric that does not know about the actual input size to the model, the loss/metric values can be way off, since essentially a chunk of the expected signal is zeroed in the estimate. (I don't think it hurts training though.)

What to do? Maybe just add note to the docs?

mpariente commented 3 years ago

A note in the docs seems good.

Also maybe exposing a function to compute the length would make sense so that the metric can be computed on the non-zero part of the signal?

jonashaag commented 3 years ago

I did a patch that adds the doc hint and the function. I realised the function may be generally useful, even to non-STFT models, to return the reconstructed size (without the pad_x_to_y reconstruction done in BaseEncoderMaskerDecoder.forward()). Do you think we should move it to BaseEncoderMaskerDecoder, and also move the docs hint there?

mpariente commented 3 years ago

Can you open a PR with this patch please? And we'll move the discussion there?