MadryLab / trak

A fast, effective data attribution method for neural networks in PyTorch
https://trak.csail.mit.edu/
MIT License
169 stars 22 forks source link

Blockwise matmul for scoring #43

Closed kristian-georgiev closed 10 months ago

kristian-georgiev commented 1 year ago

@AlaaKhaddaj How does this plan sound: let's

Once we have all this, we can further optimize how we save&load the TRAK features and target gradients to reduce I/O latency.

kristian-georgiev commented 10 months ago

https://github.com/MadryLab/trak/commit/16e9d4627c41292a4b81a0d28962dbc42803239c incorporates https://github.com/MadryLab/trak/pull/43/commits/ee43d8ba0e7c7da4da932a06e5783fec609325b8 (block-wise get_scores for large datasets).

kristian-georgiev commented 10 months ago

https://github.com/MadryLab/trak/commit/62426eba866ff566cbf9ca9c28d12933ab9ffee6 incorporates https://github.com/MadryLab/trak/commit/efb67196a78dbd868801cf532c51504a68db2f6b (only write to disk once when scoring).

kristian-georgiev commented 10 months ago

I left things inside of BasicScoreComputer and changed the signature of get_scores to use an accumulator to store the results, instead of making a new FastScoreComputer.

kristian-georgiev commented 10 months ago

https://github.com/MadryLab/trak/commit/259f087071a9dcf248e65727d3bb269ed563baea incorporates the rest of the enhancements.