BirdVox / birdvoxdetect

A pre-trained deep learning system for detecting bird flight calls in continuous recordings
MIT License
78 stars 15 forks source link

Type instability of melspectrogram (librosa #825) #3

Closed lostanlen closed 5 years ago

lostanlen commented 5 years ago

https://github.com/librosa/librosa/issues/825 (closed by https://github.com/librosa/librosa/pull/832), has made STFT's type-stable in terms of float32 vs. float64 precision. Bumping our requirements to librosa 0.6.3 would spare us in compute_pcen:

    # Gather frequency bins according to the Mel scale.
    # NB: as of librosa v0.6.2, melspectrogram is type-instable and thus
    # returns 64-bit output even with a 32-bit input. Therefore, we need
    # to convert PCEN to single precision eventually. This might not be
    # necessary in the future, if the whole PCEN pipeline is kept type-stable.
    melspec = librosa.feature.melspectrogram(
        y=None,
        S=abs2_stft,
        sr=pcen_settings["sr"],
        n_fft=pcen_settings["n_fft"],
        n_mels=pcen_settings["n_mels"],
        htk=True,
        fmin=pcen_settings["fmin"],
        fmax=pcen_settings["fmax"])

    # Compute PCEN.
    pcen = librosa.pcen(
        melspec,
        sr=pcen_settings["sr"],
        hop_length=pcen_settings["hop_length"],
        gain=pcen_settings["pcen_norm_exponent"],
        bias=pcen_settings["pcen_delta"],
        power=pcen_settings["pcen_power"],
        time_constant=pcen_settings["pcen_time_constant"])

    # Convert to single floating-point precision.
    pcen = pcen.astype('float32')
justinsalamon commented 5 years ago

Whatever we do we should pin the requirement to a specific version, which must be the precise version used to compute the features used to train the models we'll be distributing.

lostanlen commented 5 years ago

Solved in 38f6350bf20bb23908375fb614265c8e245f7f0e