MichiganCOG / ViP

Video Platform for Action Recognition and Object Detection in Pytorch
MIT License
219 stars 37 forks source link

Fails on zero grad #21

Open lemmersj opened 5 years ago

lemmersj commented 5 years ago

In instances where a neuron doesn't factor into the loss (e.g., a component of the loss is disabled for a specific experiment, resulting in a neuron or set of neurons being unused), autograd returns None for the unused connections. This results in a crash at the line:

param.grad *= 1./float(args['psuedo_batch_loop']*args['batch_size']

With the error:

TypeError: unsupported operand type(s) for *=: 'NoneType' and 'float'

This can be remedied by inserting: if param.grad is not None: prior to the line in question, but I'm unsure of any upstream consequences.

natlouis commented 5 years ago

That should've been fixed with issue #7 with the following line: https://github.com/MichiganCOG/ViP/blob/dev/train.py#L182.

Do you have this version from dev pulled?

lemmersj commented 5 years ago

I'm using an older version (apart from pulling from master, I immediately made train.py unmergeable). My mistake for missing that issue.

lemmersj commented 5 years ago

I came back to this --- it appears the modification in the dev branch resolves a different problem. That is, the weights that are causing and issue for me are not frozen, but have no gradient because they do not contribute to the loss.

Consider three regression nodes --- yaw, pitch, and roll. I modify training to only regress yaw by performing backpropagation on that node directly. The weights leading into the nodes for roll and pitch are left as "None" by the autograd on loss.backward(), and thus fail at the cited line.

ehofesmann commented 5 years ago

Can you post your code? Training and relevant loss and model files. A github link would work.