TRAIS-Lab / dattri

`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
https://trais-lab.github.io/dattri/
MIT License
28 stars 8 forks source link

[dattri.algorithm] IF attributor for exisiting ihvp functions #32

Closed TheaperDeng closed 4 months ago

TheaperDeng commented 4 months ago

This PR is a little bit away from perfect. Just push it now to see if there is any feedback.

TheaperDeng commented 4 months ago

@tingwl0122 please also have a look

jiaqima commented 4 months ago

@sleepymalc please also take a look.

sleepymalc commented 4 months ago

@sleepymalc please also take a look.

I had a pass on the algorithms. They generally look fine to me, didn't catch any noticeable bugs.

jiaqima commented 4 months ago

closing and re-opening to test pytest.

TheaperDeng commented 4 months ago

I make the code more memory-efficient now. Currently it takes 2 optional parameters called train_gradient_batch_size and test_gradient_batch_size (default set to the length of trianing/testing dataloader) to control the performance. Any setting to these 2 parameters should not affect the final result.

I run the experiment on A40 with mnist + lr. calculating 1000 training sample X 1000 test samples influence. (CG ihvp)

train_gradient_batch_size /test_gradient_batch_size (1000,1000) (500,500) (100, 100)
Time 13.15s 17.18s 59.07s
Peak Memory 333.87M 182.56M 57.00M

As for "excplicit" ihvp, the memory all exceed 1500M.

TheaperDeng commented 4 months ago

@sleepymalc please also take a look.

I had a pass on the algorithms. They generally look fine to me, didn't catch any noticeable bugs.

@sleepymalc Please also have a look, there has been some update since last time.

sleepymalc commented 4 months ago

@TheaperDeng Sure. Sorry for being late as I'm going to be traveling tomorrow. I'll take time to have a look tomorrow morning.

TheaperDeng commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

tingwl0122 commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

Should we check the performance of IF?

TheaperDeng commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

Should we check the performance of IF?

It works well in example/mnist_lr/influence_function.py as stated in https://github.com/TRAIS-Lab/dattri/pull/32#issue-2255624424, so it should not be too bad.

tingwl0122 commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

Should we check the performance of IF?

It works well in example/mnist_lr/influence_function.py as stated in #32 (comment), so it should not be too bad.

Thanks for the notice. Then LGTM. After this, I think we can first merge TracIn. But I will soon open another PR to make it comply to IF (use the same technique to aggregate TDA score as well as make sure it works normal in the same example script)