DBD-research-group / BirdSet

A benchmark dataset collection for bird sound classification
https://huggingface.co/datasets/DBD-research-group/BirdSet
BSD 3-Clause "New" or "Revised" License
17 stars 8 forks source link

Error with background noise augmentation during evaluation #248

Closed ilyassmoummad closed 1 month ago

ilyassmoummad commented 1 month ago

Hi, first of all congratulations on the awesome work!

I'm trying to execute (after having downloaded background):

python3 birdset/eval.py experiment="birdset_neurips24/HSN/LT/efficientnet.yaml"

I have an error with this line:

datamodule = hydra.utils.instantiate(cfg.datamodule)

it crashes because of the background noise augmentation :

raise EmptyPathException("There are no supported audio files found.")
torch_audiomentations.core.transforms_interface.EmptyPathException: There are no supported audio files found.

I have ensured that the dcase18 dataset is downloaded, and in the configuration file, both cfg.datamodule.transforms.waveform_augmentations and background_paths, cfg.paths.background_paths point to the correct path of the background files.

Additionally, I'm wondering why augmentation is needed for evaluation. Thank you very much in advance for your assistance!

lurauch commented 1 month ago

Hi Ilyass,

Thanks for the first (external) issue! :) I'll investigate further and report back with my findings.

Regarding augmentations for evaluation: If we're using a pretrained model and evaluating it on LT or MT, background augmentations should not be necessary. However, for DT, these augmentations are required since we cannot validate on the POW validation dataset (the model does not know the respective classes). As a result, we validate on Xeno-Cano focal recordings, which need augmentations to better resemble a soundscape file. In your case, though, these augmentations should not be needed.

lurauch commented 1 month ago

For a quick fix: can you please set the augmentations to none? This should work:

python3 birdset/eval.py experiment=birdset_neurips24/HSN/LT/efficientnet.yaml datamodule.transforms.waveform_augmentations=none.yaml datamodule.transforms.spectrogram_augmentations=none.yaml

We did not test that yet but you can try. You can also add an override statement to the experiment file if you don't want to add everything in the command line.

ilyassmoummad commented 1 month ago

Hi Lukas, thanks a lot for your quick reply!

I tried the quick fix but I got this :

Error executing job with overrides: ['experiment=birdset_neurips24/HSN/LT/efficientnet.yaml', 'datamodule.transforms.waveform_augmentations=none.yaml']
Error in call to target 'birdset.datamodule.components.transforms.BirdSetTransformsWrapper':
AttributeError("'str' object has no attribute 'get'")
full_key: datamodule.transforms

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I'm sorry, I don't have experience with complex project management using hydra/yaml, it's probably a newbie error.

lurauch commented 1 month ago

Sorry, my bad. Try:

python eval.py experiment=birdset_neurips24/HSN/LT/efficientnet.yaml datamodule/transforms/spectrogram_augmentations=none.yaml datamodule/transforms/waveform_augmentations=none.yaml

This should work. If not, just adjust the complete birdset_neurips24/HSN/LT/efficientnet.yaml experiment file:


# @package _global_
defaults:
  - override /datamodule: HSN.yaml
  - override /module: multilabel.yaml
  - override /module/network: efficientnet.yaml
  - override /callbacks: default.yaml
  - override /trainer: single_gpu.yaml
  - override /datamodule/transforms: bird_default_multilabel.yaml
  - override /paths: default.yaml
  - override /hydra: default.yaml
  - override /datamodule/transforms/spectrogram_augmentations: none.yaml
  - override /datamodule/transforms/waveform_augmentations: none.yaml

tags: ["birdsetLT", "inference"]
seed: 1
train: False
test: True

logger:
  wandb:
    tags: ${tags}
    group: "LT_HSN_efficientnet"
    mode: disabled
    version: LT_efficientnet_${seed}_${start_time}

module:
  network:
    model:
      local_checkpoint: null #Add the path to your XCM pretraining checkpoint here, if it is saved locally.
      checkpoint: DBD-research-group/EfficientNet-B1-BirdSet-XCL #Add the HuggingFace path to your XCM pretraining checkpoint here if it is uploaded on HuggingFace.
      pretrain_info:
        hf_path: ${datamodule.dataset.hf_path}
        hf_name: ${datamodule.dataset.hf_name}
        hf_pretrain_name: XCL
        valid_test_only: False

datamodule:
  dataset:
    val_split: null
    class_weights_loss: null
    class_weights_sampler: null
    classlimit: null
    eventlimit: null

  transforms:
    preprocessing:
      spectrogram_conversion:
        n_fft: 2048
        hop_length: 256
        power: 2.0
      melscale_conversion:
        n_mels: 256
        n_stft: 1025

  loaders:
    test:
      batch_size: 64
      num_workers: 32
ilyassmoummad commented 1 month ago

thanks a lot for your help, it works!

lurauch commented 1 month ago

We have a solution planned so that the eval.py script does not depend on the availability of augmentations - so thank you for pointing that out! If you have any other questions regarding BirdSet or want to talk about bird research, feel free to open another issue or contact me via mail lukas.rauch.@uni-kassel.de :)

lurauch commented 1 month ago

Will be solved with #249