Integrate Hard Labels into KD Criterion

mavaylon1 commented 3 weeks ago

TBD

mavaylon1 commented 3 weeks ago

@mavaylon1

carbonscott commented 3 weeks ago

Pre-training csv (hard label): /sdf/data/lcls/ds/prj/prjcwang31/results/proj-peaknet/pretrain
Distillation csv (soft label): /sdf/data/lcls/ds/prj/prjcwang31/results/proj-peaknet/distill

carbonscott commented 3 weeks ago

FYI, I turn off loss balancing by doing the following in the shell script exp.distill.atto.sh. No code change is required.

# [KNOWLEDGE DISTILLATION]
TEMPERATURE=2.0
FOCAL_ALPHA="[0.25, 0.75]"
FOCAL_GAMMA=2
LAM_MSE=0.4
LAM_KL=0.4
LAM_FOCAL=0.2
EMA_MOMENTUM=null

In train.distill.py, loss balancing is turned off when EMA momentum is set to None. See the example below about MSE scaler:

self.mse_scaler   = EMA(ema_momentum) if ema_momentum is not None else None
...
mse_scale = self.mse_scaler.update(math.log(1 + 1 / mse_loss.detach())) if self.mse_scaler is not None else 1

carbonscott / exp-peaknet

Integrate Hard Labels into KD Criterion #6