Why using torch.no_grad() instead of eval() mode?

Zeleni9 commented 4 years ago

Hi David,

I had the same idea for hourglass pytorch model, trained on UnityEyes dataset, but you made it really good. I just have question about why are you not using .eval() mode in inference? When I tried eval() mode I get very different results, probably because of the BatchNorm statistics calculation.

david-wb commented 4 years ago

@Zeleni9 Great question.

Using no_grad() without eval mode basically turns off batch normalization at inference time. This makes sense to me because the inference time data (webcam) is very different from the training data (unity eyes). You might try experimenting with setting track_running_stats=False in the batch norm layers and then running in eval mode, but you'll need to retrain the model. Please feel free to submit a PR if you find that this works. If you find other relevant info to shed more light on this issue also please comment again here, I'd appreciate it.

Thanks!

Zeleni9 commented 4 years ago

Hi David,

I have trained the model with nn.BatchNorm2d(features, track_running_stats=False) and some small changes to keep all operations in Float throughout the net. After training it for 10 epochs and running evaluation I got the mean angular error around 12-13%.

I did test model in torch.no_grad() and eyenet.eval() and it yields the similar results.

Results of model - 44999

error [17.87877716] mean error 12.241872422782441 side right gaze [[0.04301077 0.11114367]] gaze pred [[0.05413113 0.42336184]]

But unfortunately the trained model is not valid when exported with ONNX, I am not sure about that being problem with track_running_stats=False or something else. Since I have exported it good with your pretrained model to ONNX format.

Update: It is the track_running_stats=False that changes the nodes and then the graph is invalid in ONNX conversion discussion With your code and only track_running_stats=False i got results like this:

error [5.87883281] mean error 11.070940301345876 Len of mean error list: 45000 side right gaze [[0.04301077 0.11114367]] gaze pred [[-0.03775143 0.17444813]]

david-wb commented 4 years ago

Hi, thanks for the updates and clarification. I have observed the same effects on my end using track_running_stats=False.

I can't help you with the ONNX conversion but I am watching your pytorch discussion.

Zeleni9 commented 4 years ago

Hey David,

I did train the model now with removing the nn.batchNorm() elements in the model and training went smooth. Evalulation on MPII Gaze was around:

Trained model without batch norm Eyenet - 44999

error [8.79008233] mean error 10.929743261465049 Len of mean error list: 45000 side right gaze [[0.04301077 0.11114367]] gaze pred [[0.08114091 0.260042 ]

But the visualization is totaly off, gaze vectors jumping around, landmarks are not even close to the pupil. Did you do some post processing that makes it work on webcam, if yes how can I turn it off. I am bit confused with training with better evaluation error, but behaving way worse then with nn.batchNorm(features, track_running_stats=False).

I see that it is trained on Synthetic Eyes and webcam is different, but how can batchNorm layer change it that much?

Thanks.

david-wb commented 4 years ago

The Residual layers use batch norm and it can't be turned off with a flag. I'm guessing that is why you see that behavior.

When you retrain make sure you start a new model file too.

Zeleni9 commented 4 years ago

No I didn't use the flag, I deleted code with BatchNorm completely in the Residual layers.

What do you mean by starting a new model file? I am removing checkopoint.pt for each training.

david-wb commented 4 years ago

With BN completely removed, I'm surprised there would be any difference at eval time. Afraid I can't answer this one.

About starting a new model file, you are on the same page.

Zeleni9 commented 4 years ago

With BN completely removed, torch_no_grad() and eval() give the same result, but visualy on the webcam it looks very incorrect when comparining with for example BatchNorm with nn.batchNorm(features, track_running_stats=False).

Update: Results were different in the .eval() beacuse I called .eval() on the model directly after loading from checkpoint.pt. I think it is possible it had random statistics. After I have run 5-10 inferences in .train() mode and then switched to .eval() I got the same correct results for both modes.

I have managed to convert model to ONNX format and it works. Interesting behaviour to be honest.

Well no problem, you helped a lot. Thank you for insightful information. Best.

david-wb / gaze-estimation

Why using torch.no_grad() instead of eval() mode? #2

Results of model - 44999

Trained model without batch norm Eyenet - 44999