frederick0329 / TracIn

Implementation of Estimating Training Data Influence by Tracing Gradient Descent (NeurIPS 2020)
Apache License 2.0
219 stars 15 forks source link

Is Error similarity score equal to TracIN score in colab notebooks? #6

Closed VenkateshSatagopan closed 3 years ago

VenkateshSatagopan commented 3 years ago

Hi Frederick, Thanks for your colab notebook implementation. I am trying to understand the calculation of TracIN score in the notebook "resnet50_imagenet_proponents_opponents" where you calculate 3 scores in the loss function as shown below

def find(loss_grad=None, activation=None, topk=50):
        if loss_grad is None and activation is None:
              raise ValueError('loss grad and activation cannot both be None.')
        scores = []
        scores_lg = []
        scores_a = []
        for i in range(len(trackin_train['image_ids'])):
             if loss_grad is not None and activation is not None:
                     lg_sim = np.sum(trackin_train['loss_grads'][i] * loss_grad)
                     a_sim = np.sum(trackin_train['activations'][i] * activation)
                     scores.append(lg_sim * a_sim)
                     scores_lg.append(lg_sim)
                     scores_a.append(a_sim)

Here you calculate lg_sim, a_sim, and scores and mention them as error_similarity, encoding similarity and influence when you display proponents and opponents for a particular test image. lg_sim calculation is similar to the formula for calculating TracIN mentioned in the paper. so is the lg_sim score is equivalent to TracIN scores for differentiating proponents and opponents? Is my understanding correct? If so what are the significance of a_sim and scores parameters?

Thanks in Advance

frederick0329 commented 3 years ago

Thank you for the question.

Please refer to appendix F of the paper. influence(tracin score) = lg_sim (error_similarity) * a_sim (encoding similarity)

VenkateshSatagopan commented 3 years ago

Thanks frederick, I got it now about the influence score calculation.