Hi, I read your paper and was wondering why you choose to threshold the output of the teacher model and then argmax it, effectively turning the soft labels into hard labels.
Was it not possible to train with soft labels (i.e. labels with confidence scores of the teacher model) or did thresholding + argmax provide better performance?
Hi, I read your paper and was wondering why you choose to threshold the output of the teacher model and then argmax it, effectively turning the soft labels into hard labels.
Was it not possible to train with soft labels (i.e. labels with confidence scores of the teacher model) or did thresholding + argmax provide better performance?