Appearing 'nan' value during the training time by using my own counting dataset?

Thank you for your interesting work and it is very useful in practical applications. I encountered this question, when training my own counting dataset? I added "print" in matcher.py and I finded there were NAN values. why? please help me, thank you very much!

def forward(self, features, patches):
    bs, c, h, w = features.shape
    features = features.flatten(2).permute(2, 0, 1)  # hw * bs * dim

    proj_feat = self.query_conv(features)
    patches_feat = self.key_conv(patches)
    patches_ca = self.activation(self.dynamic_pattern_conv(patches_feat))

    proj_feat = proj_feat.permute(1, 0, 2)
    patches_feat = (patches_feat * (patches_ca + 1)).permute(1, 2, 0)  # bs * c * exemplar_number        
    energy = torch.bmm(proj_feat, patches_feat)                        # bs * hw * exemplar_number
    print("energy.mean(), features.mean(): ", energy.mean(), features.mean())

energy.mean(), features.mean(): tensor(-14.2665, device='cuda:4', grad_fn=) tensor(0.1236, device='cuda:4', grad_fn=) torch.Size([8, 1024]) torch.Size([8, 1024]) 0.0 energy.mean(), features.mean(): tensor(-17.7250, device='cuda:4', grad_fn=) tensor(0.1354, device='cuda:4', grad_fn=) torch.Size([8, 1024]) torch.Size([8, 1024]) 0.0 energy.mean(), features.mean(): tensor(nan, device='cuda:4', grad_fn=) tensor(nan, device='cuda:4', grad_fn=)

flyinglynx / Bilinear-Matching-Network

Appearing 'nan' value during the training time by using my own counting dataset? #12