ValueError - Githubissues

y0un0 commented 2 years ago

Hello,

Have you encountered this error during training ?

Traceback (most recent call last): File "train.py", line 136, in pred_post, pred_prior, latent_loss, depth_pred_post, depth_pred_prior = generator.forward(images,depths,gts) File "/content/UCNet/model/ResNet_models.py", line 157, in forward self.posterior, muxy, logvarxy = self.xy_encoder(torch.cat((x,depth,y),1)) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/UCNet/model/ResNet_models.py", line 125, in forward dist = Independent(Normal(loc=mu, scale=torch.exp(logvar)), 1) File "/usr/local/lib/python3.7/dist-packages/torch/distributions/normal.py", line 50, in init super(Normal, self).init(batch_shape, validate_args=validate_args) File "/usr/local/lib/python3.7/dist-packages/torch/distributions/distribution.py", line 56, in init f"Expected parameter {param} " ValueError: Expected parameter loc (Tensor of shape (10, 3)) of distribution Normal(loc: torch.Size([10, 3]), scale: torch.Size([10, 3])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], device='cuda:0', grad_fn=)

JingZhang617 commented 2 years ago

Hi, Yes, it happens for the kl divergence computing, while the mu or sigma term turn to be nan. The solution to try is lower the weight for the latent loss term. Try it and let me know if it works.

Cheers, Jing

On Sat, 20 Nov 2021 at 10:29 am, y0un0 @.***> wrote:

Hello,

Have you encountered this error during training ?

Traceback (most recent call last): File "train.py", line 136, in pred_post, pred_prior, latent_loss, depth_pred_post, depth_pred_prior = generator.forward(images,depths,gts) File "/content/UCNet/model/ResNet_models.py", line 157, in forward self.posterior, muxy, logvarxy = self.xy_encoder(torch.cat((x,depth,y),1)) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/content/UCNet/model/ResNet_models.py", line 125, in forward dist = Independent(Normal(loc=mu, scale=torch.exp(logvar)), 1) File "/usr/local/lib/python3.7/dist-packages/torch/distributions/normal.py", line 50, in init super(Normal, self).init(batch_shape, validate_args=validate_args) File "/usr/local/lib/python3.7/dist-packages/torch/distributions/distribution.py", line 56, in init* f"Expected parameter {param} " ValueError: Expected parameter loc (Tensor of shape (10, 3)) of distribution Normal(loc: torch.Size([10, 3]), scale: torch.Size([10, 3])) to satisfy the constraint Real(), but found invalid values: tensor([[nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan], [nan, nan, nan]], device='cuda:0', grad_fn=)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/JingZhang617/UCNet/issues/16, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6B4F5M5VIIYPETMMKGDFLUM3MULANCNFSM5INHAOIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jing Zhang Ph.D. Student College of Engineering and Computer Science, Australian National University. Email: @.***

y0un0 commented 2 years ago

OK i will try thank you

y0un0 commented 2 years ago

Do you have an idea of the value that i need to use for the latent weight ? I tried using --lat_weight=5.0 unfortunatly the error popped again

JingZhang617 commented 2 years ago

1-5. 5 works for me.

On Sat, 20 Nov 2021 at 12:36 pm, y0un0 @.***> wrote:

Do you have an idea of the value that i need to use for the latent weight ? I tried using --lat_weight=5.0 unfortunatly the error popped again

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/JingZhang617/UCNet/issues/16#issuecomment-974569829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE6B4FYT5PHCJJNYWQDCLDLUM33QLANCNFSM5INHAOIA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

-- Jing Zhang Ph.D. Student College of Engineering and Computer Science, Australian National University. Email: @.***

y0un0 commented 2 years ago

Ok i launched another experiment with a --lat_weight=1.0, i 'll keep you updated on the results.

y0un0 commented 2 years ago

It worked with lat_weight=1.0. I was wondering, can the batch_size influence the value that we put for lat_weight ? I'm asking you that because i used a batch size of 10 instead of 5.

JingZhang617 / UCNet

ValueError #16