bellymonster / Weighted-Soft-Label-Distillation

55 stars 8 forks source link

Update knowledge_distiller.py #5

Closed DeepLearningHB closed 3 years ago

DeepLearningHB commented 3 years ago

In my case, I have met a gradient exploding problem when I was training with this code. To prevent this issue, I suggest that adding a small epsilon to line 73