Working with a Resnet50 modek on imagenet revealed a memory issue with the implementation of CgInfluence.
The issue is due to precomputing the averaged gradient over the training dataset, while keeping the computation graph for computing the second order derivatives. This is not feasible for large datasets.
Resolution:
Expose the parameter precopmute_grad. Setting to False, results in the averaging of the second order derivatives over the batches, which requires less memory (no need to keep the computation graph) but is slower in general.
Working with a Resnet50 modek on imagenet revealed a memory issue with the implementation of CgInfluence.
The issue is due to precomputing the averaged gradient over the training dataset, while keeping the computation graph for computing the second order derivatives. This is not feasible for large datasets.
Resolution:
Expose the parameter
precopmute_grad
. Setting to False, results in the averaging of the second order derivatives over the batches, which requires less memory (no need to keep the computation graph) but is slower in general.The same applies to ArnoldiInfluence.