Description

1. Motivation and Context

In article Studying Large Language Model Generalization with Influence Functions section 3.2.1, they proposed a method by using tfidf to calculate the similarity between test data and training data. Select subsets with high similarity to calculate influence values, thereby reducing computational complexity. For our benchmark nanoGPT, it could be a feasible method.

2. Summary of the change

Add a tfidf_subset_sampler in model_utils.

3. What tests have been added/updated for the change?

N.A.

TRAIS-Lab / dattri

[dattri.model_utils] Add a subsetsampler by using tfidf to calculate the similarity of text #73

Description

1. Motivation and Context

2. Summary of the change

3. What tests have been added/updated for the change?