DBD-research-group / BirdSet

A benchmark dataset collection for bird sound classification
https://huggingface.co/datasets/DBD-research-group/BirdSet
BSD 3-Clause "New" or "Revised" License
25 stars 9 forks source link

Reduced Batch Size in One-hot Encoding in BirdsetDataModule #263

Closed nhaH-luaP closed 1 month ago

nhaH-luaP commented 1 month ago

https://github.com/DBD-research-group/BirdSet/blob/23381a561d37a2fc446f6358db53230b54310bcc/birdset/datamodule/birdset_datamodule.py#L153-L162

With one-hot encoding of the labels, the batch_size of 1500 seems to be unnecessarily large, especially when compared to mapping the data a few lines above, where it is only 300. I had several runs on a cluster stuck without any error messages at the one-hot encoding step without knowing exactly why, until I manually reduced the BatchSize to 300. Are there any reasons for this high batch_size here? Otherwise i would recommend reducing it to 300.

Greetings, Paul

raphaelschwinger commented 1 month ago

@nhaH-luaP Hey Paul, On our hardware 1500 worked well and fast, but I guess 300 is a more sensible and compatible choice, thanks for pointing this out!

Greetings

Raphael