fschmid56 / EfficientAT

This repository aims at providing efficient CNNs for Audio Tagging. We provide AudioSet pre-trained models ready for downstream training and extraction of audio embeddings.
MIT License
233 stars 44 forks source link

dcase20 dataset loader #28

Closed stefan-balke closed 6 months ago

stefan-balke commented 7 months ago

Hi there,

thanks for sharing the code. I was in particular interested in the dcase20 dataset loader. However, something in the MixupDataset classed confused me.

See this link: https://github.com/fschmid56/EfficientAT/blob/main/datasets/dcase20.py#L101

SimpleSelectionDataset is returning x, label, device, city, self.available_indices[index]. It is then interpreted as x1, f1, y1, d1, c1. It turns out that during training, x1 and f1 are used. However, I think the goal is to return a one-hot encoded and weighted version of the label (y1 * l + y2 * (1. - l)).

Maybe not relevant for this training but in case someone stumbles on this or wants to reuse it might find it useful!

stefan-balke commented 7 months ago

Saw that you refactored that in the new DCASE24 baseline: https://github.com/CPJKU/dcase2024_task1_baseline/blob/main/dataset/dcase24.py

fschmid56 commented 6 months ago

Hi! Thanks for pointing this out. Indeed, I messed up the MixupDataset for DCASE when I simplified the repo for public access.

It should be fixed now.