fzliu / style-transfer

An implementation of "A Neural Algorithm of Artistic Style" by L. Gatys, A. Ecker, and M. Bethge. http://arxiv.org/abs/1508.06576.
1.54k stars 387 forks source link

Basis for googlenet weights #38

Open OverQuantum opened 8 years ago

OverQuantum commented 8 years ago

Currently for googlenet model "content" representation goes to inception_3a/output (mainly) and conv2/3x3 (2e-4); and "style" representation goes to 5 layers from conv1/7x7_s2 to inception_5a/output. Is there some basis for this choose?

inception_3a/output is closer to the input of network. Activation of this layer via "deepdream" method produces only spiral patterns and edge stroking, not even "eyes". In vgg19 "content" goes into conv4_2 which is 10th of 16 convolution layers and it is placed after 3rd maxpool layer. In googlenet 3rd maxpool is pool3 and so layers similar to conv4_2 should be inception_4b or 4c or 4d. I see in commit history only 1 test with inception_3b+inception_4a (50% each), while looks like no higher layers were tested. Does anyone performed some seach on where in googlenet should go "content" representation?

fzliu commented 8 years ago

These were more or less determined empirically by trying different layers and seeing which ones seemed to work the best (emphasis on the "seemed"). I think there's still plenty of room to vary the settings here.

Regardless, try this .caffemodel if you're using the Inception model: https://www.dropbox.com/s/tdaowz2au059iqi/googlenet_style.caffemodel?dl=0

OverQuantum commented 8 years ago

Ok, thank you for clarification and for the model. Will do experiments with googlenet and layers.