[DO NOT MERGE] Reference refactoring of BaseInnerProductAttributor

jiaqima commented 2 weeks ago

Description

This PR is a reference implementation to refactor the BaseInnerProductAttributor, which is NOT intended to be merged (and has not been tested). As an example, the Arnoldi attributor is re-implemented to demonstrate how the new API of BaseInnerProductAttributor can be used.

A few major changes in BaseInnerProductAttributor:

transformation could now be applied to both train data representations/gradients and test data representations/gradients
the batching over the full training data for transformation was original dealt inside .attribute() with a simple average, which lacks enough flexibility; it is now moved into the transform functions
.cache() could do more than just storing the full training dataloader. For example, the Arnoldi projector is now processed and cached in transformation_on_query in a hacky way. In this PR, this is moved to .cache()
removed transformation_kwargs and added an init for the subclass. It's better to specify these attributor hyperparameters explicitly than relying on a kwargs

jiaqima commented 1 week ago

@TheaperDeng, one thing I'm not sure is if we want to set the default transform functions for both train and test as identical transformation? As in many cases, only test rep needs to be transformed.

TheaperDeng commented 1 week ago

@TheaperDeng, one thing I'm not sure is if we want to set the default transform functions for both train and test as identical transformation? As in many cases, only test rep needs to be transformed.

It happens that I am working on this. I think one of the possible way is that, we just make transform_test_rep to be abstractmethod, that is users always need to implement this one. and for transform_train_rep, we can have a default implementation that returns the train rep directly.

Or, we directly set them both to be defaultly return the rep directly. Users can override them when design their TDA methods.

jiaqima commented 1 week ago

@TheaperDeng, one thing I'm not sure is if we want to set the default transform functions for both train and test as identical transformation? As in many cases, only test rep needs to be transformed.

It happens that I am working on this. I think one of the possible way is that, we just make transform_test_rep to be abstractmethod, that is users always need to implement this one. and for transform_train_rep, we can have a default implementation that returns the train rep directly.

Or, we directly set them both to be defaultly return the rep directly. Users can override them when design their TDA methods.

I think the latter is Ok. Users implementing new attributors should be a bit more tolerant than users who just use the attributors, and they don't need an abstract method to remind them this is something they should override.

TheaperDeng commented 1 week ago

@TheaperDeng, one thing I'm not sure is if we want to set the default transform functions for both train and test as identical transformation? As in many cases, only test rep needs to be transformed.

It happens that I am working on this. I think one of the possible way is that, we just make transform_test_rep to be abstractmethod, that is users always need to implement this one. and for transform_train_rep, we can have a default implementation that returns the train rep directly. Or, we directly set them both to be defaultly return the rep directly. Users can override them when design their TDA methods.

I think the latter is Ok. Users implementing new attributors should be a bit more tolerant than users who just use the attributors, and they don't need an abstract method to remind them this is something they should override.

Yeah, I think so. And this makes the innerproduct attributor can be directly instantiated and works like Grad-Dot, just like its name.

jiaqima commented 1 week ago

@tingwl0122 could you please take a look at the new implementation of the Arnoldi attributor and see if there is anything wrong?

tingwl0122 commented 1 week ago

@tingwl0122 could you please take a look at the new implementation of the Arnoldi attributor and see if there is anything wrong

I think the overall structure looks good to me. (Moving projection matrix computation to cache has been listed as one todo) Another thing is that the original Arnoldi paper actually used only a random batch of the full dataloader to do the algorithm, which I didn't include before. We can do something on the data_target_pair before feeding it into func to achieve this now. (also adding an additional param to control this random batch's size)

tingwl0122 commented 1 week ago

@tingwl0122 could you please take a look at the new implementation of the Arnoldi attributor and see if there is anything wrong

I think the overall structure looks good to me. (Moving projection matrix computation to cache has been listed as one todo) Another thing is that the original Arnoldi paper actually used only a random batch of the full dataloader to do the algorithm, which I didn't include before. We can do something on the data_target_pair before feeding it into func to achieve this now. (also adding an additional param to control this random batch's size)

another minor thing is that we will need to force users to use .cache before attributing. We can probably raise some errors for this.

jiaqima commented 1 week ago

@tingwl0122 could you please take a look at the new implementation of the Arnoldi attributor and see if there is anything wrong

I think the overall structure looks good to me. (Moving projection matrix computation to cache has been listed as one todo) Another thing is that the original Arnoldi paper actually used only a random batch of the full dataloader to do the algorithm, which I didn't include before. We can do something on the data_target_pair before feeding it into func to achieve this now. (also adding an additional param to control this random batch's size)

another minor thing is that we will need to force users to use .cache before attributing. We can probably raise some errors for this.

Sounds great. Thanks for the notes. I'll add them later.

TheaperDeng commented 1 week ago

@tingwl0122 could you please take a look at the new implementation of the Arnoldi attributor and see if there is anything wrong

I think the overall structure looks good to me. (Moving projection matrix computation to cache has been listed as one todo) Another thing is that the original Arnoldi paper actually used only a random batch of the full dataloader to do the algorithm, which I didn't include before. We can do something on the data_target_pair before feeding it into func to achieve this now. (also adding an additional param to control this random batch's size)

another minor thing is that we will need to force users to use .cache before attributing. We can probably raise some errors for this.

Sounds great. Thanks for the notes. I'll add them later.

Thanks @tingwl0122 @jiaqima I think I can directly make the changes in my implementation #133

jiaqima commented 1 week ago

Closing this PR in favor of #133. Any follow-up discussion could be done there.

TRAIS-Lab / dattri

[DO NOT MERGE] Reference refactoring of BaseInnerProductAttributor #130

Description