Training with PyTorch Lightning and the distributed data parallel strategy requires that all parameters that are used in the forward pass are also involved in the backward pass. While there is a keyword to allow for unused parameters, this slows down training significantly. This is an issue with ANI2x and SAKE.
Training with PyTorch Lightning and the distributed data parallel strategy requires that all parameters that are used in the forward pass are also involved in the backward pass. While there is a keyword to allow for unused parameters, this slows down training significantly. This is an issue with ANI2x and SAKE.