TRAIS-Lab / dattri

`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
https://trais-lab.github.io/dattri/
24 stars 8 forks source link

[dattri.benchmark] add a subsetsampler for easy usage #66

Closed TheaperDeng closed 3 months ago

TheaperDeng commented 3 months ago

Description

1. Motivation and Context

Sometimes we need a subsetsampler to use a subset for the benchmark. This is an easy implementation to make a subset by a sampler.

2. Summary of the change

  1. Add a subsetsampler for future usage
  2. Add a parameter --train_subset <int> default to 5000 for the dattri_retrain command.

3. What tests have been added/updated for the change?

tingwl0122 commented 3 months ago

LGTM

tingwl0122 commented 3 months ago

I guess this will be used within the retrain script?

TheaperDeng commented 3 months ago

I guess this will be used within the retrain script?

Yes, I will merge this PR after some unit-test

tingwl0122 commented 3 months ago

I guess this will be used within the retrain script?

Yes, I will merge this PR after some unit-test

Just noticed this. Otherwise I am thinking why MT re-training is running so long...

tingwl0122 commented 3 months ago

So I guess we don't need to have this data_length anymore?

TheaperDeng commented 3 months ago

So I guess we don't need to have this data_length anymore?

Yes, I have removed it from the paramters