genekogan / neural_style

PyTorch implementation of neural style transfer algorithm
MIT License
54 stars 8 forks source link

Making continuous masking work right #1

Open genekogan opened 4 years ago

genekogan commented 4 years ago

The "continuous-masking" branch of this repository overhauls the way masks are specified. In the master branch, you input a single content_seg segmentation image which is color coded according to color_codes and matched to the colors associated with each style_seg image. In this branch, we instead get rid of the color codes, and input multiple grayscale content_seg images, one for each style image, where the brightness corresponds positively to how much of that style is let through onto the content_image. The style_seg parameter lets you also create style masks for each style image which let you extract style just from the bright region in the mask, but this is optional, defaulting to white (extract style from the entire style image). The purpose of this change is to allow for arbitrary style mixture, rather than just limited to discrete non-overlapping regions like in the master branch.

For example, the following command:

python neural_style.py -content_image examples/inputs/hoovertowernight.jpg \
    -style_image examples/inputs/starry_night.jpg,examples/inputs/hokusai.jpg \
    -content_seg examples/segments/hoovertowernight1a.png,examples/segments/hoovertowernight1b.png 

produces the following output:

out

Notice that the content_seg images (hoovertowernight1a.png and hoovertowernight1b.png) are discrete, black on one side and white on the other side. This associates one half of the image fully with starry_night.jpg and one half with hokusai.jpg.

This works fine, but we'd like to be able to use continuous masks that blend/transition between the two style images. For example, a content_seg using hoovertowernight2a.png and hoovertowernight2b.png should interpolate between the two styles along the horizontal length of the output image, starting with hokusai on the left and ending at starry_night on the right. But if we try to run it:

python neural_style.py -content_image examples/inputs/hoovertowernight.jpg \
    -style_image examples/inputs/starry_night.jpg,examples/inputs/hokusai.jpg \
    -content_seg examples/segments/hoovertowernight2a.png,examples/segments/hoovertowernight2b.png 

We get the following result, where at both extremes the style is transferred well, but in the middle, where both style images contribute roughly equal influence, there is little effect, and instead what we get is mostly a reconstruction of the content image.

out5

This effect is especially visible if we run the same command as above but set content_weight 0 to do a pure texture synthesis with no content reconstruction. The middle region appears muddy with a poor transition between the two styles.

out6

One way to fix this problem is by using covariance matrix instead of a Gram matrix for the style statistic. By adding the line x_flat = x_flat - x_flat.mean(1).unsqueeze(1) just before the return statement return torch.mm(x_flat, x_flat.t()) in GramMatrix, and then running the same command as above with content_weight 0, we get the following result where styles appear to transition horizontally, as expected, however the quality of the style reconstruction appears to be somewhat worse.

out7c

It would be desirable to find a way to do good style transitions without compromising on the quality of style reconstruction, in the same way that it's possible to do this with transitioning between different class optimizations in deepdream. Possible strategies that might help is to use a different style statistic, like for example, style feature histogram loss instead of Gram or covariance matrices. Another might be to use the masks in a different way other than masking the feature activation maps.

Any insights into how to possibly improve continuous style masking are greatly appreciated...

nikjetchev commented 4 years ago

I tried to have a look and identify the problem. However, I cannot run the experiment with the command line options you gave:

-- hukosai.jpg is not in the repository, also the masks for hoover tower need to be manually copied

-- -style_seg option also needs to be set?

-- -color_codes black needs to be set

I figured this out myself and used

python neural_style.py -content_image examples/inputs/hoovertowernight.jpg -style_image examples/inputs/starry_night.jpg,examples/inputs/cubist.jpg -style_seg examples/segments/starry_night.png,examples/segments/cubist.png -content_seg examples/segments/hoovertowernight2a.png,examples/segments/hoovertowernight2b.png -color_codes black -backend cudnn

but then I get another error, two file names are given but preprocess takes a single one content_seg_caffe = preprocess(params.content_seg, params.image_size, to_normalize=False).type(dtype)

File "neural_style.py", line 91, in main content_seg_caffe = preprocess(params.content_seg, params.image_size, to_normalize=False).type(dtype) File "neural_style.py", line 396, in preprocess image = Image.open(image_name).convert('RGB') File "/usr/local/anaconda3/lib/python3.7/site-packages/PIL/Image.py", line 2766, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: 'examples/segments/hoovertowernight2a.png,examples/segments/hoovertowernight2b.png'

would be great if you fix these minor issue so then one can focus on understanding the main blending mask problem

Quasimondo commented 4 years ago

@nikjetchev - looks like you checked out the master branch instead of the continuous-masking one - I made the same mistake.

genekogan commented 4 years ago

@nikjetchev @Quasimondo sorry for burying the note about using a different branch -- i should have been clearer about that! Yes, you should checkout the "continuous-masking" branch. I am keeping master with the old features to stay in sync with the original upstream.

However you're right that I forgot to upload the hokusai image. I just added that. Let me know if it works for you.

alexjc commented 4 years ago

Quick summary of random experiments I posted in the Twitter thread: 1) The desaturation is a common problem in neural-style, and IMHO it's unrelated. Unfortunately, the changes required to this specific codebase are bit too big for me to hack it quickly. 2) Gram matrices don't allow you to "cross-fade" between styles, instead they seem to make "collage" of styles based on the weight, so 50% / 50% canvas allocation (see attached images).

mixv1 mixv2

In short, I don't think an approach based on gram matrices will work easily. Fast Style methods have a linear latent space rather than a matrix, and those work well. There are a few other ideas in the Twitter discussion, but too early to say.