Puzer / stylegan-encoder

StyleGAN Encoder - converts real images to latent space
Other
1.07k stars 166 forks source link

Using the stylegan-encoder for the LSUN bedroom dataset #7

Open mfhbree opened 5 years ago

mfhbree commented 5 years ago

@Puzer you did some great work here! I'm trying to apply your encoder to the bedroom network, however the generated images are not of the same quality as the results of the FFHQ network. Initially, I changed the SGD optimizer to the Adam optimizer, because the loss decreased faster and the generated images looked more realistic. I also tried to tweak the hyperparameters, but only the number of iterations has a major impact on the result. Do you have any ideas to improve the results?

Command Original Results
--iterations 1000 --batch_size 1 --lr 0.1 content 1 content 1-1000itr
--iterations 2000 --batch_size 4 --lr 0.1 content 1 content 1-2000itr-4batch
--iterations 10000 --batch_size 1 --lr 0.1 content 1 content 1-10000itr
Command Original Results
--iterations 1000 --batch_size 1 --lr 0.1 style 2 style 2-1000itr
--iterations 2000 --batch_size 4 --lr 0.1 style 2 style 2-2000itr-4batch
--iterations 10000 --batch_size 1 --lr 0.1 style 2 style 2-10000itr

The following results are generated while using the SGD optimizer:

Command Original Results
--iterations 2000 --batch_size 2 --lr 1 bedroom2-0 bedroom2-2000itr-2batch
--iterations 3000 --batch_size 3 --lr 1 --randomize_noise bedroom2-0 bedroom2-3000itr-3batch-rn
--iterations 4000 --batch_size 4 --lr 1 bedroom2-0 bedroom2-4000itr-4batch

Results of style transfer:

Fine style mixing

image

Coarse style mixing

image

Both results are significantly different from the original StyleGAN style transfer results for bedrooms. Because of the lower quality of the images it was expected to have slightly less stunning images, however the original images are not derivable in these results.

jcpeterson commented 5 years ago

@mfhbree can you share how you altered the optimizer? I get:

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value beta1_power

mfhbree commented 5 years ago

@mfhbree can you share how you altered the optimizer? I get:

FailedPreconditionError (see above for traceback): Attempting to use uninitialized value beta1_power

You need to initialize the variables that the Adam optimizer adds. If you add the following line below the optimizer, it will work.

self.sess.run(tf.variables_initializer(optimizer.variables()))

This is my optimize function:

def optimize(self, vars_to_optimize, iterations=500, learning_rate=.1):
        vars_to_optimize = vars_to_optimize if isinstance(vars_to_optimize, list) else [vars_to_optimize]
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        min_op = optimizer.minimize(self.loss, var_list=[vars_to_optimize])
        self.sess.run(tf.variables_initializer(optimizer.variables()))
        for _ in range(iterations):
            _, loss = self.sess.run([min_op, self.loss])
            yield loss
mfhbree commented 5 years ago

For everyone that is interested in transferring the style of bedrooms, I fixed my issue by changing the VGG16 model, which is used by the perceptual model, to the ResNet-50 model. However, the quality of the reconstructed images is still quite low, so it is only possible to interpolate coarse styles of a by StyleGAN generated image and a reconstructed image.

An example: transfer1

jcpeterson commented 5 years ago

@mfhbree thanks, that worked.

Regarding LSUN, I got similar results with faces when I didn't align and the aspect ratio of the face was squeezed a bit. Is your preprocessing identical to that used for training?

pender commented 5 years ago

@mfhbree Have you tried style transfer on the dlatents that you obtain by running a random qlatent vector through the mapping net, instead of a dlatent that you get from running the encode_images script? Even though both are 18x512 arrays, the mapping net produces dlatents where all 18 layers are identical, whereas the encode_images script produces dlatents where all 18 layers are meaningfully different (and the differences are important to recreate the image). Since style transfer consists of substituting certain dlatent layers from one image over those of another, I wouldn't expect optimized dlatents to work the same way as dlatents generated by the mapping networks.

nile649 commented 5 years ago

how did you find the direction of the change?

ppries commented 4 years ago

@mfhbree how did you get this to run exactly? I get ValueError: Dimension 1 in both shapes must be equal, but are 18 and 14. Shapes are [1,18,512] and [?,14,512] when trying to run with the StyleGAN model pretrained on LSUN Bedrooms (https://drive.google.com/uc?id=1MOSKeGF0FJcivpBI7s63V9YHloUTORiF). Do you have an idea as to what I'm doing wrong?

eps696 commented 4 years ago

@ppries latent dimensions in progressive networks depend on the network resolution: 18 is for 1024x1024 models (such as ffhq), 14 is for 256x256 (such as bedrooms). you need to change that hardcoded value 18 to 14 in generator_model.py (at least) to make it fit

ppries commented 4 years ago

Thank you, @eps696. That works.

tg-bomze commented 4 years ago

I had the following error:

ValueError: Dimension 1 in both shapes must be equal, but are 18 and 12. Shapes are [1,18,512] and [?,12,128]

My pre-trained model differs in dimension from those parameters in hardcode. Indeed, the model for generating abstractions (https://drive.google.com/file/d/1RQ2SKnJYaCzJgkCJVsUrtkwFUpyU6_yh/view?usp=sharing) was trained on 128x128 images. I went into the generator_model.py file and changed all 18 to 12, and 512 to 128. The previous error disappeared, but the following appeared:

Traceback (most recent call last): File "encode_images.py", line 244, in <module> main() File "encode_images.py", line 130, in main perceptual_model.build_perceptual_model(generator, discriminator_network) File "./stylegan-encoder/encoder/perceptual_model.py", line 188, in build_perceptual_model self.loss += self.discriminator_loss * tf.math.reduce_mean(self.discriminator.get_output_for(tflib.convert_images_from_uint8(generated_image_tensor, nhwc_to_nchw=True), self.stub)) File "./stylegan-encoder/dnnlib/tflib/network.py", line 222, in get_output_for out_expr = self._build_func(*final_inputs, **build_kwargs) File "<string>", line 592, in D_basic File "/tensorflow-1.15.2/python3.6/tensorflow_core/python/framework/ops.py", line 645, in set_shape raise ValueError(str(e)) ValueError: Dimension 1 in both shapes must be equal, but are 0 and 10. Shapes are [1,0] and [?,10].

My uploaded image for encoding has a resolution of 128x128. I don't understand what could be the problem.