Defaults for Multiscale STFT loss

csteinmetz1 / auraloss

Collection of audio-focused loss functions in PyTorch

Apache License 2.0

695 stars 66 forks source link

Defaults for Multiscale STFT loss #38

Open turian opened 1 year ago

turian commented 1 year ago

        fft_sizes=[1024, 2048, 512],
        hop_sizes=[120, 240, 50],
        win_lengths=[600, 1200, 240],

These are the defaults provided. What sample rate are they intended for?

(Just curious, how did you choose them? But desired sample rate is more important for me.)

csteinmetz1 commented 1 year ago

This is a good question and likely should be added to the docstring.

These are the values from the paper we based the implementation on https://arxiv.org/abs/1910.11480. Based on the paper they are meant for audio at 24 kHz. I generally do not use these default values in most of my setups which are at a higher sample rate. DDSP opted to use a larger number of window and frame sizes which perhaps mitigates somewhat the variability across sample rates.

turian commented 1 year ago

Yeah. I guess I take a more hardcore mindset here and believe that NO defaults should be provided, and the docstring should give a few examples (with associated SRs) and their cites. The way it is now, it's a bit easy to footgun yourself I think?