ProGamerGov / neural-style-pt

PyTorch implementation of neural style transfer algorithm
MIT License
834 stars 178 forks source link

Cannot use conv layers in vvg-19 #44

Closed qwerdbeta closed 4 years ago

qwerdbeta commented 4 years ago

Hello, I'm using something like:

python neural_style.py -style_image myStyle.png -content_image myImage.jpg -output_image profile.png -gpu 0 -backend cudnn -num_iterations 5000 -image_size 1000 -style_weight 600 -style_scale 1.2 -style_layers conv1_1

or any convX_Y layer for style or content and I get the following error:


Running optimization with L-BFGS Traceback (most recent call last): File "neural_style.py", line 455, in main() File "neural_style.py", line 257, in main optimizer.step(feval) File "C:\Users[username]\neural-style-pt-master\lib\site-packages\torch\optim\lbfgs.py", line 307, in step orig_loss = closure() File "neural_style.py", line 248, in feval loss.backward() File "C:\Users[username]\neural-style-pt-master\lib\site-packages\torch\tensor.py", line 150, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "C:\Users[username]\neural-style-pt-master\lib\site-packages\torch\autograd__init__.py", line 99, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 791000]], which is output 0 of ViewBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).


When I use the ones in your example code (-content_layers relu0,relu3,relu7,relu12 -style_layers relu0,relu3,relu7,relu12), It's just a black image outputted, but no errors at least thrown.

When I try the default which uses (-style_layers relu1_1,relu2_1,relu3_1,relu4_1,relu5_1 -content_layers relu4_2) it works fine.

ProGamerGov commented 4 years ago

The layers relu0,relu3,relu7,relu12 are for network in network (NIN) models, were you using the NIN model when you got a black output image? You need to specify that you are using the NIN model with: -model_file models/nin_imagenet.pth.

As for using the conv layers, was that possible to in the original neural-style? If it wasn't possible in the original neural-style, then it may be expected behavior unless modifications are made to neural-style-pt.

qwerdbeta commented 4 years ago

thanks for getting back. I think I was not specifying the NIN model. That was issue..that part is closed, but for vgg19 model:

I'm not sure if orginal supported those models but it worked for someone who wrote the text at https://github.com/ProGamerGov/neural-style-pt#implementation-details

"Implementation details

Images are initialized with white noise and optimized using L-BFGS.

We perform style reconstructions using the conv1_1, conv2_1, conv3_1, conv4_1, and conv5_1 layers and content reconstructions using the conv4_2 layer. As in the paper, the five style reconstruction losses have equal weights."

So it seems like it worked for whoever wrote that.

I am just assuming that the more layers involved, the better the results, but is this even true? Could we use all 19 some day? I can't find good info on which layers provide what "knowledge".

qwerdbeta commented 4 years ago

PS> Are you aware of any research or code showing the same results across different model params and models? That is next area I want to explore in detail.

ProGamerGov commented 4 years ago

https://github.com/jcjohnson/neural-style#implementation-details

Like in the original neural-style, loss modules are placed after the conv layer and it's ReLU layer. Ex: conv1_1 <--> relu1_1 <--> loss_module. If you put the loss module before the conv layer, then it would be on the layer before the conv layer that you wanted to use. I may be able to figure out a better explanation if you need one.

I'm entirely not sure why the original neural-style allowed you to choose conv layers, but I believe it was done for something like being easy to modify that area of the code. My code just follows after what the original did, as lost as is not detrimental.

qwerdbeta commented 4 years ago

so with vgg-19 is it possible to add any more layers than the defaults for better results?

ProGamerGov commented 4 years ago

Usable layers for the VGG-19 models are: relu1_1, relu1_2, relu2_1, relu2_2, relu3_1, relu3_2, relu3_3, relu3_4, relu4_1, relu4_2, relu4_3, relu4_4, relu5_1, relu5_2, relu5_3, relu5_4. You can use any of those layers for the both the -style_layer and -content_layer parameters.

You can see all the possible layers for VGG-16 and NIN models here, just make sure that the layers you pick have "relu" in the name.

Lower layers tend to focus more on things like texture and color, while higher layers focus on larger details like objects. You can find visualizations of each layer/neuron online, and you can also use pytorch-convis if you want to see how a particular neuron or layer views an image.

The neural-style issues page may have useful information with what people have found different layer combinations do:

ex: https://github.com/jcjohnson/neural-style/issues/250 https://github.com/jcjohnson/neural-style/issues/345

qwerdbeta commented 4 years ago

thanks!!!

so helpful!!!

qwerdbeta commented 4 years ago

OK I understand now except what is the difference between say relu1_1 and relu1_2? Are they both more style based (not content) but trained on different image sets? if so wouldn't using both always be more robust (being able to pick from a larger set of iamges) unless the item you are matching happens to be a specialty of just one or the other?

ProGamerGov commented 4 years ago

relu1_1 for example has 64 by 64 neurons/filters/channels while relu1_2 has 128 by 128 neurons/filters/channels. You can see more about the number of channels here. relu1_2 will have it's neurons excited by slightly more complex things than relu1_1. The research paper on the Visual Geometry Group's VGG architecture may be able to provide you with a more in depth explanation: https://arxiv.org/abs/1409.1556

The default models had all of their layers trained on the same ImageNet dataset. Models that were created by fine-tuning an existing model (like VGG16_SOD_finetune) had their lower layers frozen for the most part, and only the upper most layers were given new information. In general though, models generally have all layers trained on the same datasets, except for when they're fine-tuned (in which case only the uppermost layers were allowed to learn new data.

The chosen style layers control what the model sees in the style image(s), while the chosen content layers control what the model sees in the content image. In general, it seems like the chosen style layer(s) may be a lot more important than the chosen content layer(s). You can find other neural-style-pt compatible models here: https://github.com/ProGamerGov/neural-style-pt/wiki/Other-Models

JaledMC commented 4 years ago

@qwerdbeta, in addition to the tips given by @ProGamerGov , for your question about different combinations of models and params, there are some visual articles and examples:

But in my experience, these relations change between style images. You can use a loop to iterate between some combinations, with few -num_iterations, to check best matches, or use techniques like multires. The best tunning I have found is voltax3.

Hope it helps.

qwerdbeta commented 4 years ago

@JaledMC wow nice model!! @ProGamerGov How do you make those multi res images on reddit? Those look great. It's the closest thing I have seen to transferring some of the 'content', not just the style onto another content.

ProGamerGov commented 4 years ago

@qwerdbeta To make those images, I use scripts like these: https://github.com/ProGamerGov/Multiscale-Resolution-Scripts, and I also use histogram matching with: https://github.com/ProGamerGov/Neural-Tools. I also use higher style weights, and sometimes run a multires script multiple times to "solidify" the style's content on the generated image.

ProGamerGov commented 4 years ago

I'm going to close this issue, as all the questions have been answered.

qwerdbeta commented 4 years ago

thanks, again, Gov.

qwerdbeta commented 4 years ago

The multires scripts are great. If I understand this correctly, none of these would allow a larger imagine to be outputted than what gpu memory would normally allow on a single pass/render, correct? However, they obviously allow content or style effects not possible on a single pass/scale.

The seg one uses masks and dithers them to the whole. Does the masked region really only use the memory primarily? What I mean is if divided the image into 4 quadrants and made a mask for each, could we render an image roughly 4 times larger by doing a pass (or passes) on each quadrant? Or is it still primarily affected by the total size of the original unmasked image?

Also, my GTX 1080 8 GB cannot handle the larger sizes in these multires scripts so I adjusted them to never exceed about an 1100 pixel size which is about where I max out. I may try Google collab next to get more.

ProGamerGov commented 4 years ago

@qwerdbeta The trick with multires is to maximize quality in terms of parameters until you run out of memory. Then you adjust your parameters (replace L-BFGS with Adam, switch models, etc...) so that you can keep pushing the image size higher. If you look closely, most multires scripts do this. Most of the major changes occur at the smaller image sizes, so it's not that much of a concern to switch out memory hungry parameters at higher image sizes. You aren't supposed to or even able to take VGG-19 and L-BFGS to the highest possible resolution.

I've actually never tested that before, so I have no idea. The only issues I can foresee with that approach, is that you have to find a way to blend the stylizes areas together meaninglessly.

qwerdbeta commented 4 years ago

@ProGamerGov I have been doing many experiments in a row that are somewhat repetitive and the multires is another good example. It seems like for each run, it has to learn/create a new model from scratch. Is there a way to do multi-pass style transfer where the model is cumulative and uses the style and content associations from previous runs? Effectively adding to the vvg19 model for example?

ProGamerGov commented 4 years ago

@qwerdbeta The model used is completely frozen and doesn't actually learn anything when it's run by neural-style-pt. The only changes neural-style-pt makes, are picking where you want the user specified layers, and removing the unused classification layers. The user specified content/style layers contain a copy of the image (target), and they calculate how different it is to the input image, so you aren't really going to gain anything by storing the targets/inputs. Continuing to use the loss calculations from a past run could result in getting stuck following a worse path (gradient descent), and things like image size vary among other things between runs.

You could try using -normalize_weights if you want a similar effect to running a multires script multiple times.

qwerdbeta commented 4 years ago

@ProGamerGov Do I understand this correctly: if we use the lua version of scripts, we can use the caffemodels and prototxt files? But if we use the pytorch (*.py) implementation, we have to use the .pth models? I don't think there are any vcc 16 SOD fine tuned models converted to pth out there or the high intensity colors ones, or the pruned ones in this format?

ProGamerGov commented 4 years ago

@qwerdbeta Yeah, PyTorch only uses .pth files, while Torch7 uses .caffemodel and prototxt files.

I converted many of the original Lua/Torch7 models from Caffe to PyTorch a while back: https://github.com/ProGamerGov/neural-style-pt/wiki/Other-Models

You can download them directly from here: https://drive.google.com/open?id=1OGKfoIehp2MiJL2Iq_8VMTy76L6waGC8

qwerdbeta commented 4 years ago

@ProGamerGov thank you!!!!!!!!!!!!!!!!!