Different loss function

Hi! Very great work and wonderfull code. Thanks man! Have you tried non cross-entropy-based loss functions? I think off loss function in unsupervised methods for anomaly detection for filtering out anomly datapoints. One example for such a loss function is the differences between student and teacher intermediate layers. Am I right, that if it only depends on gradient, we could skip the hallucination-part and use the derivate only? Have a nice day Tobi

JordanAsh / badge

Different loss function #14