aai-institute / pyDVL

pyDVL is a library of stable implementations of algorithms for data valuation and influence function computation
https://pydvl.org
GNU Lesser General Public License v3.0
100 stars 8 forks source link

Memory leak in CgInfluence #497

Closed schroedk closed 7 months ago

schroedk commented 7 months ago

Working with a Resnet50 modek on imagenet revealed a memory issue with the implementation of CgInfluence.

The issue is due to precomputing the averaged gradient over the training dataset, while keeping the computation graph for computing the second order derivatives. This is not feasible for large datasets.

Resolution:

Expose the parameter precopmute_grad. Setting to False, results in the averaging of the second order derivatives over the batches, which requires less memory (no need to keep the computation graph) but is slower in general.

The same applies to ArnoldiInfluence.