Open mavaylon1 opened 3 weeks ago
@mavaylon1
/sdf/data/lcls/ds/prj/prjcwang31/results/proj-peaknet/pretrain
/sdf/data/lcls/ds/prj/prjcwang31/results/proj-peaknet/distill
FYI, I turn off loss balancing by doing the following in the shell script exp.distill.atto.sh
. No code change is required.
# [KNOWLEDGE DISTILLATION]
TEMPERATURE=2.0
FOCAL_ALPHA="[0.25, 0.75]"
FOCAL_GAMMA=2
LAM_MSE=0.4
LAM_KL=0.4
LAM_FOCAL=0.2
EMA_MOMENTUM=null
In train.distill.py
, loss balancing is turned off when EMA momentum is set to None. See the example below about MSE scaler:
self.mse_scaler = EMA(ema_momentum) if ema_momentum is not None else None
...
mse_scale = self.mse_scaler.update(math.log(1 + 1 / mse_loss.detach())) if self.mse_scaler is not None else 1
TBD