hi,why the sampler (CUR or SUR) is not consistent with the original implementation of CCKD (Correlation Congruence for Knowledge Distillation)?
And I would like to know what delta[:-1] * delta[1:] denotes?
Thanks!
@winycg , that's the code the original author shared with me. You can find the snippets that I commented out, which are my re-implementation according to the paper.
hi,why the sampler (CUR or SUR) is not consistent with the original implementation of CCKD (Correlation Congruence for Knowledge Distillation)? And I would like to know what delta[:-1] * delta[1:] denotes? Thanks!