TRAIS-Lab / dattri

`dattri` is a PyTorch library for developing, benchmarking, and deploying efficient data attribution algorithms.
https://trais-lab.github.io/dattri/
MIT License
27 stars 8 forks source link

[dattri.algorithm] A few notes and discussions needed of RPS implmentation #57

Closed tingwl0122 closed 2 months ago

tingwl0122 commented 5 months ago

Here is a placeholder to describe a few notes on RPS and see people's opinions about my current thoughts. I will soon raise a PR about the RPS algorithm. (Update: the PR is created at #58)

To lay some background and for preview, this is my intended __init__ for the RPSAttributor:

class RPSAttributor(BaseAttributor):
    """Representer point selection attributor."""

    def __init__(
        self,
        target_func: Callable,
        model: torch.nn.Module,
        intermediate_layer_name: str,
        final_linear_layer_name: str,
        l2_strength: float = 0.003,
        device: str = "cpu",
    ) -> None:

A few notes as follows:

  1. Input [target_func]: As mentioned in #55, RPS doesn't need the gradient of loss w.r.t. params like IF/TracIn/TRAK. But indeed, RPS still needs the gradient of loss w.r.t. "model pre-activation output". I currently still preserve this target_func as an input of RPSAttributor, but it will serve the purpose of loss function. For example, current working test cases use
    
    def f(pre_activation_list, label_list):
    return F.cross_entropy(pre_activation_list, label_list)
and 
```python
def f(pre_activation_list, label_list):
    return F.binary_cross_entropy_with_logits(pre_activation_list, label_list)
  1. Input [intermediate_layer_name, final_linear_layer_name]: RPS works on the assumption that the model can be decomposed as a feature model and a prediction model. Basically, it assumes the final layer is something like nn.Linear(final_feature_dim, class_number). As described above, I now allow users to type in the name of the second-to-last layer and the final layer to allow maximum generalizability.

However, people must hack into the model param space, which is definitely sub-optimal. Another alternative is: Just enforce this assumption and automatically select these two layers using model.named_children(). We will give an error if the final layer is not the type of nn.Linear.

  1. Restriction on loss function and activation: As described in the paper, their derivation is only natural to sigmoid+binary cross-entropy loss for binary classification problems and softmax+cross-entropy loss for multi-class classification problems. Their source repo only considers the binary case. I am unsure whether we need to explicitly write this out in the code. Or at least give some warnings if otherwise.

  2. Validity of RPSAttributor.attribute() for multi-class classification: As derived in the paper, the "representer value for $x_i$ given $x_t$" will be a vector $\in \mathbb{R}^c$, where $c$ stands for the number of label classes. Thus, the attribution score is only valid when solving a binary classification problem. (This is probably why RPS and its following work: RPS_LJE only do binary cases in their experiments.) I propose to make this clear for users who want to use RPS or at least raise some warnings.

tingwl0122 commented 5 months ago

Hi @jiaqima @TheaperDeng , I listed down some design issues of RPS. Please take a look if you have time; I will modify the current implementation accordingly before the PR.

tingwl0122 commented 5 months ago

Update: If following the assumption of the RPS paper (the last layer is always the prediction linear layer), I will modify the __init__ for the RPSAttributor as follows:

class RPSAttributor(BaseAttributor):
    """Representer point selection attributor."""

    def __init__(
        self,
        target_func: Callable,
        model: torch.nn.Module,
        final_linear_layer_name: str,
        l2_strength: float = 0.003,
        device: str = "cpu",
    ) -> None:

This removes the usage of intermidate_layer_name by assuming the representer feature is always the input of the last linear layer.

tingwl0122 commented 4 months ago

Update:

  1. We will still allow general loss functions as input (possibly other than BCE loss and CE loss). So, just the functionality changed, and the docstring changed accordingly.
  2. Based on the comment above, we will now just use the input and output of the final linear layer name. So users will need to input the name of this layer (to avoid possible bugs if directly utilizing the order in model.children())
  3. No further clarification will be done.
  4. For multi-label classification, the influence of train sample $x_i$ on test sample $xt$ will be $\alpha{i,j} f_i^T f_t$, where $j$ corresponds to the class label of $x_t$ and $\alpha_i$ is a $c$-dimensional vector ($c$ is the number of classes).