Open icelighting opened 4 months ago
thanks for your great work. when i use the dmd distillation code, i find the snr loss is not use the mse loss, but the coeff * latents, not the grad and may be negative. Is it related to the way model learning using snr gamma?
The default args.snr_gamma should be none ? I am also puzzled about the difference between these two, which one should use
thanks for your great work. when i use the dmd distillation code, i find the snr loss is not use the mse loss, but the coeff * latents, not the grad and may be negative. Is it related to the way model learning using snr gamma?