[dattri.algorithm] IF attributor for exisiting ihvp functions

TheaperDeng commented 4 months ago

This PR is a little bit away from perfect. Just push it now to see if there is any feedback.

Add an IFAttributor to dattri.algorithms

class IFAttributor(BaseAttributor):
"""Influence function attributor."""
def __init__(self,
             target_func: Callable,
             params: dict,
             ihvp_solver: Callable = ihvp_explicit,
             projector: AbstractProjector = None,
             device: str = "cpu",
             ) -> None:
    """Influence function attributor.
    Args:
        target_func (Callable): The target function to be attributed,
            the function is quite flexible, but it should take the parameters
            and the dataloader as input. A typical example is as follows:
            ```python
            @flatten_func(model)
            def f(params, dataloader):
                loss = nn.CrossEntropyLoss()
                loss_val = 0
                for image, label in dataloader:
                    yhat = torch.func.functional_call(model, params, image)
                    loss_val += loss(yhat, label)
                return loss_val
            ```.
            It calculate the loss of the model on the dataloader.
        params (dict): The parameters of the target function, the key is
            the name of the parameter and the value is the parameter tensor.
            TODO: This should be changed to support a list of parameters or
                paths for ensembling and memory efficiency.
        ihvp_solver (Callable): The solver for inverse hessian vector product
            calculation, currently we only support the non-at-x solver within the
            `dattri.func.ihvp` module.
            TODO: Make this one more flexible.
        projector (AbstractProjector): Currently this is not used.
            TODO: Enable the use of random projection for memory efficiency.
        device (str): The device to run the attributor. Default is cpu.
    """

def cache(self, dataloader: torch.utils.data.DataLoader) -> None:
    """Cache the dataset for inverse hessian calculation.
    Args:
        dataloader (torch.utils.data.DataLoader): The dataloader with full training
            samples for inverse hessian calculation.
    """

def attribute(self,
              train_dataloader: torch.utils.data.DataLoader,
              test_dataloader: torch.utils.data.DataLoader) -> torch.Tensor:
    """Calculate the influence of the training set on the test set.
    Args:
        train_dataloader (torch.utils.data.DataLoader): The dataloader for
            training samples to calculate the influence.
        test_dataloader (torch.utils.data.DataLoader): The dataloader for
            test samples to calculate the influence.
    Returns:
        torch.Tensor: The influence of the training set on the test set, with
            the shape of (num_train_samples, num_test_samples).
    """

Add an example here example/mnist_lr/influence_function.py for auc detection. Output like this (there are 94 flipped labels in total)

calculating gradient of training set...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:02<00:00, 450.86it/s]
calculating gradient of test set...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 1106.85it/s] 
calculating ihvp...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.84it/s] 
[(0, 0), (100, 36), (200, 52), (300, 67), (400, 76), (500, 82), (600, 90), (700, 93), (800, 94), (900, 94)]
Checked Data Sample      Found flipped Sample
--------------------------------------------------
0                        0
100                      36
200                      52
300                      67
400                      76
500                      82
600                      90
700                      93
800                      94
900                      94

TheaperDeng commented 4 months ago

@tingwl0122 please also have a look

jiaqima commented 4 months ago

@sleepymalc please also take a look.

sleepymalc commented 4 months ago

@sleepymalc please also take a look.

I had a pass on the algorithms. They generally look fine to me, didn't catch any noticeable bugs.

jiaqima commented 4 months ago

closing and re-opening to test pytest.

TheaperDeng commented 4 months ago

I make the code more memory-efficient now. Currently it takes 2 optional parameters called train_gradient_batch_size and test_gradient_batch_size (default set to the length of trianing/testing dataloader) to control the performance. Any setting to these 2 parameters should not affect the final result.

I run the experiment on A40 with mnist + lr. calculating 1000 training sample X 1000 test samples influence. (CG ihvp)

`train_gradient_batch_size` /`test_gradient_batch_size`	(1000,1000)	(500,500)	(100, 100)
Time	13.15s	17.18s	59.07s
Peak Memory	333.87M	182.56M	57.00M

As for "excplicit" ihvp, the memory all exceed 1500M.

TheaperDeng commented 4 months ago

@sleepymalc please also take a look.

I had a pass on the algorithms. They generally look fine to me, didn't catch any noticeable bugs.

@sleepymalc Please also have a look, there has been some update since last time.

sleepymalc commented 4 months ago

@TheaperDeng Sure. Sorry for being late as I'm going to be traveling tomorrow. I'll take time to have a look tomorrow morning.

TheaperDeng commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

tingwl0122 commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

Should we check the performance of IF?

TheaperDeng commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

Should we check the performance of IF?

It works well in example/mnist_lr/influence_function.py as stated in https://github.com/TRAIS-Lab/dattri/pull/32#issue-2255624424, so it should not be too bad.

tingwl0122 commented 4 months ago

Maybe I can first merge this PR to keep rolling? I guess the TracIN PR can be merged after this one @tingwl0122

Should we check the performance of IF?

It works well in example/mnist_lr/influence_function.py as stated in #32 (comment), so it should not be too bad.

Thanks for the notice. Then LGTM. After this, I think we can first merge TracIn. But I will soon open another PR to make it comply to IF (use the same technique to aggregate TDA score as well as make sure it works normal in the same example script)

TRAIS-Lab / dattri

[dattri.algorithm] IF attributor for exisiting ihvp functions #32