For both texture net and perceptual losses. As is, we're currently feeding the style image in and running it through VGG and computing gramians at every step, effectively doubling the forward propagation work.
We can instead evaluate it as fed through once ahead of time, cache that result, and use it in calculating loss.
(However if we later wanted to flip and train content networks we'd want dynamic computation of style image gramians since we'll be feeding in multiple styles.)
For both texture net and perceptual losses. As is, we're currently feeding the style image in and running it through VGG and computing gramians at every step, effectively doubling the forward propagation work.
We can instead evaluate it as fed through once ahead of time, cache that result, and use it in calculating loss.
(However if we later wanted to flip and train content networks we'd want dynamic computation of style image gramians since we'll be feeding in multiple styles.)