Question about gradient flow during training

Hello. I had a question about your network's computational graph during backpropagation. Since you have return_info=True in your call to self.forward(), would that mean that for your self.training_step() that you are ultimately returning a dictionary called info that contains not only your loss (with which you want to optimize against) but also many other loss terms that also contain valid grad_fn properties?

https://github.com/arneschneuing/DiffSBDD/blob/ca2d2ad4451893ec405308134fdffbe94e298b64/lightning_modules.py#L319

I ask because Lightning is supposed to backpropagate using the loss term only. However, would you agree that it might be safer to explicitly detach all the other loss terms from the computational graph during training_step()?

arneschneuing / DiffSBDD

Question about gradient flow during training #10