ClementPinard / FlowNetTorch

Torch implementation of Fischer et al. FlowNet training code
30 stars 6 forks source link

About the use of a parallel criterion #10

Closed ptriantd closed 7 years ago

ptriantd commented 7 years ago

This is a question irrelevant to the implementation, but related to the concept used. Why is a parallel criterion needed? Isn't it enough to train the network using only the last loss?

ClementPinard commented 7 years ago

The question is not easy to answer. What is certain is that convergence is worse when using only the last loss. It can be interpreted (not rigorously explained though) as the fact that every flow generator layer deals with a particular scale, and then a particular frequency. Low Scale is designed for large displacement, and thus high resolution layer only deals with high frequency shapes and flow values. having a flowmap upscaled from a lower scale acts as a canevas to which the upper layer will add values for finetuning instead of infering the whole flowmap at once.

I would not be surprirsed if something simpler not requiring multi scale loss (say GAN training that would train the network to do shapes identification to get a better flow map, as it has been used here for semantic segmentation) could outperform this technique.