Open santolina opened 2 years ago
Thanks a lot for your great work.
When debugging distillation, I found that sometimes alpha from student NN takes the negative value (but very close to 0). This comes from using different activation for density in teacher NN and student NN:
I guess the difference may be minor, but is there any reason for using leaky Relu for student NN?
Thanks a lot for your great work.
When debugging distillation, I found that sometimes alpha from student NN takes the negative value (but very close to 0). This comes from using different activation for density in teacher NN and student NN:
I guess the difference may be minor, but is there any reason for using leaky Relu for student NN?