RezaKakooee / space_layout_gym

MIT License
8 stars 1 forks source link

Should initial losses be NaN? #4

Open jloveric opened 1 month ago

jloveric commented 1 month ago

This could be entirely due to my setup and any mods to get it runningn (but also posting in case anyone else runs into it), but the initial losses are NaN due to tensors being empty. During training they have a size, but later on they are empty and so return Nan as loss (so it's not due to underflow or overflow). Maybe you've seen this issue. Trying to get to the bottom of it

actor_loss tensor(nan, device='cuda:0', grad_fn=<NegBackward0>) critic_loss tensor(nan, device='cuda:0', grad_fn=<MaximumBackward0>) entropy_loss tensor(nan, device='cuda:0', grad_fn=<MeanBackward0>) loss tensor(nan, device='cuda:0', grad_fn=<SubBackward0>)

and a few of the tensors being used inside the loss

b_log_pis tensor([], device='cuda:0') mb_advantage tensor([], device='cuda:0') new_mb_log_pis tensor([], device='cuda:0', grad_fn=<SqueezeBackward1>) new_mb_values tensor([], device='cuda:0', size=(0, 1), grad_fn=<AddmmBackward0>) mb_qvalue tensor([], device='cuda:0') mb_values tensor([], device='cuda:0') new_mb_entropies tensor([], device='cuda:0', grad_fn=<NegBackward0>)

initially these actually have a size during every training step.

jloveric commented 1 month ago

Ok, I figure it out. For some reason the starting index was going beyond those defined in indices. Fixed in my fork here https://github.com/RezaKakooee/space_layout_gym/compare/SpaceLayoutGym-v0...jloveric:space_layout_gym:SpaceLayoutGym-v0#diff-af26a04a04e22cd3dd1bcf7161a6c39ac8e5347972abc6d83124d48f3d0c7437R88 Added the line

if start > len(indices) :
                    break