Data augmentations/mixup for pseudolabeling

UCSD-E4E / acoustic-multiclass-training

Data processing and training pipeline for classifying bird species by sound

GNU General Public License v3.0

7 stars 2 forks source link

Data augmentations/mixup for pseudolabeling #141

Open mbazzani opened 1 year ago

mbazzani commented 1 year ago

Should we use data augmentations/ mixup for finetuning on pseudolabels? I think data augs should be significantly less aggressive for the pseudolabeling. However, do we want that to mean a different set of augs, weaker augs, no augs, or something else entirely?

benjamin-cates commented 1 year ago

For generating pseudo-labels, I think we should just run it without augs to get an accurate confidence value. For training on it, I think we should always data aug on stuff we're training on.

mbazzani commented 1 year ago

@Sean1572 @sprestrelski Thoughts on which data augs to use for finetuning?

Based on slack messages data augs seem very necessary