carbonscott / exp-peaknet

Run peaknet experiments
0 stars 1 forks source link

Integrate Hard Labels into KD Criterion #6

Open mavaylon1 opened 3 weeks ago

mavaylon1 commented 3 weeks ago

TBD

mavaylon1 commented 3 weeks ago

@mavaylon1

carbonscott commented 3 weeks ago
carbonscott commented 3 weeks ago

FYI, I turn off loss balancing by doing the following in the shell script exp.distill.atto.sh. No code change is required.

# [KNOWLEDGE DISTILLATION]
TEMPERATURE=2.0
FOCAL_ALPHA="[0.25, 0.75]"
FOCAL_GAMMA=2
LAM_MSE=0.4
LAM_KL=0.4
LAM_FOCAL=0.2
EMA_MOMENTUM=null

In train.distill.py, loss balancing is turned off when EMA momentum is set to None. See the example below about MSE scaler:

self.mse_scaler   = EMA(ema_momentum) if ema_momentum is not None else None
...
mse_scale = self.mse_scaler.update(math.log(1 + 1 / mse_loss.detach())) if self.mse_scaler is not None else 1