Closed tingwl0122 closed 2 months ago
Hi @jiaqima @TheaperDeng , I listed down some design issues of RPS. Please take a look if you have time; I will modify the current implementation accordingly before the PR.
Update:
If following the assumption of the RPS paper (the last layer is always the prediction linear layer), I will modify the __init__
for the RPSAttributor
as follows:
class RPSAttributor(BaseAttributor):
"""Representer point selection attributor."""
def __init__(
self,
target_func: Callable,
model: torch.nn.Module,
final_linear_layer_name: str,
l2_strength: float = 0.003,
device: str = "cpu",
) -> None:
This removes the usage of intermidate_layer_name
by assuming the representer feature is always the input of the last linear layer.
Update:
model.children()
)
Here is a placeholder to describe a few notes on RPS and see people's opinions about my current thoughts. I will soon raise a PR about the RPS algorithm. (Update: the PR is created at #58)
To lay some background and for preview, this is my intended
__init__
for theRPSAttributor
:A few notes as follows:
target_func
]: As mentioned in #55, RPS doesn't need the gradient of loss w.r.t. params like IF/TracIn/TRAK. But indeed, RPS still needs the gradient of loss w.r.t. "model pre-activation output". I currently still preserve thistarget_func
as an input ofRPSAttributor
, but it will serve the purpose of loss function. For example, current working test cases useintermediate_layer_name
,final_linear_layer_name
]: RPS works on the assumption that the model can be decomposed as a feature model and a prediction model. Basically, it assumes the final layer is something likenn.Linear(final_feature_dim, class_number)
. As described above, I now allow users to type in the name of the second-to-last layer and the final layer to allow maximum generalizability.However, people must hack into the model param space, which is definitely sub-optimal. Another alternative is: Just enforce this assumption and automatically select these two layers using
model.named_children()
. We will give an error if the final layer is not the type ofnn.Linear
.Restriction on loss function and activation: As described in the paper, their derivation is only natural to sigmoid+binary cross-entropy loss for binary classification problems and softmax+cross-entropy loss for multi-class classification problems. Their source repo only considers the binary case. I am unsure whether we need to explicitly write this out in the code. Or at least give some warnings if otherwise.
Validity of RPSAttributor.attribute() for multi-class classification: As derived in the paper, the "representer value for $x_i$ given $x_t$" will be a vector $\in \mathbb{R}^c$, where $c$ stands for the number of label classes. Thus, the attribution score is only valid when solving a binary classification problem. (This is probably why RPS and its following work: RPS_LJE only do binary cases in their experiments.) I propose to make this clear for users who want to use RPS or at least raise some warnings.