Closed RY4GIT closed 1 year ago
Debug session with @taddyb
mlp_forward()
was called twice in one epochself.
attributes are not reset by zero_grad()
self.cfe_instance.refkdt
and self.cfe_instance.satdk
weren't reset, so reset using torch.zeros_like(self.cfe_instance.refkdt)
self.cfe_instance
) if there are any attributes that still have grads after resettingrefkdt
and satdk
parameters.
def initialize(self):
# Initialize the CFE model with the dynamic parameter
self.cfe_instance.refkdt = torch.zeros_like(self.cfe_instance.refkdt)
self.cfe_instance.satdk = torch.zeros_like(self.cfe_instance.satdk)
self.cfe_instance.reset_flux_and_states()
self.cfe_instance.reset_volume_tracking()
self.cfe_instance.update_params(self.refkdt[:, 0], self.satdk[:, 0])
SelectBackword
is likely the slices from MLP output (so it is okay), everything else is probably the remnant from model operations from previous epochAddressed in eb03b44481cba870a4601665fedc9cf84daa5bfc
Basically facing the same issue as this one https://github.com/NWC-CUAHSI-Summer-Institute/LGAR-py/pull/11
Check this to debug https://github.com/NWC-CUAHSI-Summer-Institute/LGAR-py/blob/eec7bb4fb455b33bce7d19ee0e1ebfd588ddae37/dpLGAR/models/dpLGAR.py#L106