Closed janchk closed 1 month ago
Use inference from FP checkpoint as teacher with L2 norm to implement knowledge distillation.
L2_NORM(fp_out, quant_out) -> loss funct
Use inference from FP checkpoint as teacher with L2 norm to implement knowledge distillation.