Hi! Very great work and wonderfull code. Thanks man!
Have you tried non cross-entropy-based loss functions? I think off loss function in unsupervised methods for anomaly detection for filtering out anomly datapoints. One example for such a loss function is the differences between student and teacher intermediate layers. Am I right, that if it only depends on gradient, we could skip the hallucination-part and use the derivate only?
Have a nice day
Tobi
Hi! Very great work and wonderfull code. Thanks man! Have you tried non cross-entropy-based loss functions? I think off loss function in unsupervised methods for anomaly detection for filtering out anomly datapoints. One example for such a loss function is the differences between student and teacher intermediate layers. Am I right, that if it only depends on gradient, we could skip the hallucination-part and use the derivate only? Have a nice day Tobi