MadryLab / trak

A fast, effective data attribution method for neural networks in PyTorch
https://trak.csail.mit.edu/
MIT License
180 stars 25 forks source link

Add NaN checks for inverse computation. PyTorch 2.0 - 2.4 deadlocks on matrix inversion #71

Closed BrunoKM closed 2 months ago

BrunoKM commented 3 months ago

Due to the following issue with matrix inversion in PyTorch: https://github.com/pytorch/pytorch/issues/134334 the TRAK code will deadlock if there are any NaNs in the computed gradients. This is apparently expected behaviour, as the behaviour of torch.inv is undefined if there are NaNs.

The expected result would be a raised error. Adding NaN checks for computed tensors might be a good idea.

kristian-georgiev commented 2 months ago

Resolved by https://github.com/MadryLab/trak/commit/0920e55497904c0cd0efb63555d593578824a914; @BrunoKM feel free to re-open the issue if you still observe the deadlock.