affinelayer / pix2pix-tensorflow

Tensorflow port of Image-to-Image Translation with Conditional Adversarial Nets https://phillipi.github.io/pix2pix/
MIT License
5.07k stars 1.3k forks source link

How to change the size of output image? #56

Open knaffe opened 7 years ago

knaffe commented 7 years ago

The program could output the images with default 256x256 size. But my expected size of output images is 70x70 or any size(I want to change the size of output images). I reviewed the original paper and realized the model would have different structure of layers depended on the output size. Should I change the structure of model(G and D layers)? Thank you so much.

julien2512 commented 7 years ago

Hi @knaffe !

I know 3 parameters for that :

  1. Size ;
  2. Aspect ratio (height/width) ;
  3. Number of layers ;

You can play with these 3 before changing the model.

Regards, Julien

knaffe commented 7 years ago

@julien2512 Thanks for your answer! In this program, I find the variable "CROP_SIZE", and it is set as 256. I try to change its value, but it rises a error. I think I couldn't change the size of output image with this simple variable. Moreover, I check the Aspect ratio, I think this variable is not for the size of output image. It could control the height/width instead of size. Sadly, I don't know how to adapt the number of layers. Could you help me with more details? Thank you sooo much!!

karolmajek commented 7 years ago

I am working on similar problem. I am trying to increase output size to 1024.

I just added few lines here: https://github.com/affinelayer/pix2pix-tensorflow/blob/master/pix2pix.py#L339 and here https://github.com/affinelayer/pix2pix-tensorflow/blob/master/pix2pix.py#L357

I am now training (82M params after increase the number of layers) If I succeed I will paste link here to fork with 1024 size and maybe instruction how to change size, but this is not easy

@knaffe You can comment two lines in encoder and decoder and get 64x64. Try to set CROP_SIZE = 64 and check if it works. You can't set 70x70 so use 64 or 128 and resize to 70x70.

layer_specs = [
        a.ngf * 2, # encoder_2: [batch, 128, 128, ngf] => [batch, 64, 64, ngf * 2]
        a.ngf * 4, # encoder_3: [batch, 64, 64, ngf * 2] => [batch, 32, 32, ngf * 4]
        a.ngf * 8, # encoder_4: [batch, 32, 32, ngf * 4] => [batch, 16, 16, ngf * 8]
        a.ngf * 8, # encoder_5: [batch, 16, 16, ngf * 8] => [batch, 8, 8, ngf * 8]
        a.ngf * 8, # encoder_6: [batch, 8, 8, ngf * 8] => [batch, 4, 4, ngf * 8]
    #    a.ngf * 8, # encoder_7: [batch, 4, 4, ngf * 8] => [batch, 2, 2, ngf * 8]
    #    a.ngf * 8, # encoder_8: [batch, 2, 2, ngf * 8] => [batch, 1, 1, ngf * 8]
    ]
    layer_specs = [
      #  (a.ngf * 8, 0.5),   # decoder_8: [batch, 1, 1, ngf * 8] => [batch, 2, 2, ngf * 8 * 2]
      #  (a.ngf * 8, 0.5),   # decoder_7: [batch, 2, 2, ngf * 8 * 2] => [batch, 4, 4, ngf * 8 * 2]
        (a.ngf * 8, 0.5),   # decoder_6: [batch, 4, 4, ngf * 8 * 2] => [batch, 8, 8, ngf * 8 * 2]
        (a.ngf * 8, 0.0),   # decoder_5: [batch, 8, 8, ngf * 8 * 2] => [batch, 16, 16, ngf * 8 * 2]
        (a.ngf * 4, 0.0),   # decoder_4: [batch, 16, 16, ngf * 8 * 2] => [batch, 32, 32, ngf * 4 * 2]
        (a.ngf * 2, 0.0),   # decoder_3: [batch, 32, 32, ngf * 4 * 2] => [batch, 64, 64, ngf * 2 * 2]
        (a.ngf, 0.0),       # decoder_2: [batch, 64, 64, ngf * 2 * 2] => [batch, 128, 128, ngf * 2]
    ]
knaffe commented 7 years ago

@karolmajek You help me A LOT! Follow your suggestion, it works. Thank you again sincerely!! I have another question. I trained my model and got some output images while training. I like the output images from training. However I use my model to test the validation images, the output images are worse than those from training. I don't know how to adapt the parameters to get the output images like the output from training. In details, the discriminator_loss would increase while the generator_loss would drop down. Is my epoch or batch size unsuitable? L1_weight, gan_weight, I don't know how to fix these two factors. ngf, ndf,and how do these two factors to impact my result ? Thank you for your kindness and your great knowledge!!

groot-1313 commented 7 years ago

@karolmajek, were you successful in increasing the size to 1024? If yes, can you please mention the changes you made?

karolmajek commented 7 years ago

@groot-1313 Yes! I used it for Face2face: https://github.com/karolmajek/face2face-demo Pix2Pix 1024 here: https://github.com/karolmajek/pix2pix-tensorflow

groot-1313 commented 7 years ago

Thanks! I'll take a look!

groot-1313 commented 7 years ago

To make it 512, the only lines to be commented out/changed are: https://github.com/karolmajek/pix2pix-tensorflow/blob/master/pix2pix.py#L349 https://github.com/karolmajek/pix2pix-tensorflow/blob/master/pix2pix.py#L349 Right?

karolmajek commented 7 years ago

Here is what I changed: https://github.com/karolmajek/pix2pix-tensorflow/commit/75b8b75e2435646d712d2cc13b100f4da258a223?diff=split I added 2 extra layers in two places - you need to add only one in both of them. Each layer double the w,h

kickbox commented 7 years ago

@karolmajek Your port is just wonderful. I was wondering if you could share some predicted images in 1024px, and share details on the training time.

groot-1313 commented 7 years ago

I have done this:

`layer_specs = [
    a.ngf * 2, # encoder_2: [batch, 256, 256, ngf] => [batch, 128, 128, ngf * 2]
    a.ngf * 4, # encoder_3: [batch, 128, 128, ngf] => [batch, 64, 64, ngf * 2]
    a.ngf * 8, # encoder_4: [batch, 64, 64, ngf * 2] => [batch, 32, 32, ngf * 4]
    a.ngf * 8, # encoder_5: [batch, 32, 32, ngf * 4] => [batch, 16, 16, ngf * 8]
    a.ngf * 8, # encoder_6: [batch, 16, 16, ngf * 8] => [batch, 8, 8, ngf * 8]
    a.ngf * 8, # encoder_7: [batch, 8, 8, ngf * 8] => [batch, 4, 4, ngf * 8]
    a.ngf * 8, # encoder_8: [batch, 4, 4, ngf * 8] => [batch, 2, 2, ngf * 8]
    a.ngf * 16, # encoder_9: [batch, 2, 2, ngf * 8] => [batch, 1, 1, ngf * 8]
]`

`layer_specs = [
    (a.ngf * 16, 0.5),   # decoder_9: [batch, 1, 1, ngf * 8] => [batch, 2, 2, ngf * 8 * 2]
    (a.ngf * 8, 0.5),   # decoder_8: [batch, 2, 2, ngf * 8 * 2] => [batch, 4, 4, ngf * 8 * 2]
    (a.ngf * 8, 0.5),   # decoder_7: [batch, 4, 4, ngf * 8 * 2] => [batch, 8, 8, ngf * 8 * 2]
    (a.ngf * 8, 0.5),   # decoder_6: [batch, 8, 8, ngf * 8 * 2] => [batch, 16, 16, ngf * 8 * 2]
    (a.ngf * 8, 0.0),   # decoder_5: [batch, 16, 16, ngf * 8 * 2] => [batch, 32, 32, ngf * 4 * 2]
    (a.ngf * 4, 0.0),   # decoder_4: [batch, 32, 32, ngf * 4 * 2] => [batch, 64, 64, ngf * 2 * 2]
    (a.ngf * 2, 0.0),   # decoder_3: [batch, 64, 64, ngf * 2 * 2] => [batch, 128, 128, ngf * 2]
    (a.ngf, 0.0),       # decoder_2: [batch, 128, 128, ngf * 2 * 2] => [batch, 512, 512, ngf * 2]
]`

Also, I have another question. Any idea regarding the patch size used for the discriminator? I am assuming it is 16x16 from the output I got on my dataset.

karolmajek commented 7 years ago

Last one should have 256,256 instead of 512,512 (here: (a.ngf, 0.0), # decoder_2: [batch, 128, 128, ngf * 2 * 2] => [batch, 512, 512, ngf * 2]) Remember to change CROP_SIZE

@kickbox I was training some time ago, it was approx 3 days with 980M for these results: https://medium.com/@karol_majek/high-resolution-face2face-with-pix2pix-1024x1024-37b90c1ca7e8

groot-1313 commented 7 years ago

@karolmajek , Thanks a bunch! I tried it on a smaller dataset and it worked brilliantly. Only problem is that the images are a little blurry. I think the reason is that the patch size evaluated by the discriminator is 16x16. Any idea where I could change that? The paper mentions a patch size of 70x70 would be a better choice.

EDIT: Went through the code, it seems like it takes the entire image as a single patch -> No patch based discriminator. But I still obtain high frequency noise, like a tiling effect in my output. Any input regarding the reason?

karolmajek commented 7 years ago

Look at a.ngf * X increasing X helped me. Look at my code, you will need to change this in two places.

groot-1313 commented 7 years ago

Going with this. Will test and see.

`layer_specs = [
a.ngf * 2, # encoder_2: [batch, 256, 256, ngf] => [batch, 128, 128, ngf * 2]
a.ngf * 4, # encoder_3: [batch, 128, 128, ngf] => [batch, 64, 64, ngf * 2]
a.ngf * 8, # encoder_4: [batch, 64, 64, ngf * 2] => [batch, 32, 32, ngf * 4]
a.ngf * 8, # encoder_5: [batch, 32, 32, ngf * 4] => [batch, 16, 16, ngf * 8]
a.ngf * 16, # encoder_6: [batch, 16, 16, ngf * 8] => [batch, 8, 8, ngf * 8]
a.ngf * 16, # encoder_7: [batch, 8, 8, ngf * 8] => [batch, 4, 4, ngf * 8]
a.ngf * 32, # encoder_8: [batch, 4, 4, ngf * 8] => [batch, 2, 2, ngf * 8]
a.ngf * 32, # encoder_9: [batch, 2, 2, ngf * 8] => [batch, 1, 1, ngf * 8]
]`

`layer_specs = [
(a.ngf * 32, 0.5),   # decoder_9: [batch, 1, 1, ngf * 8] => [batch, 2, 2, ngf * 8 * 2]
(a.ngf * 16, 0.5),   # decoder_8: [batch, 2, 2, ngf * 8 * 2] => [batch, 4, 4, ngf * 8 * 2]
(a.ngf * 16, 0.5),   # decoder_7: [batch, 4, 4, ngf * 8 * 2] => [batch, 8, 8, ngf * 8 * 2]
(a.ngf * 8, 0.5),   # decoder_6: [batch, 8, 8, ngf * 8 * 2] => [batch, 16, 16, ngf * 8 * 2]
(a.ngf * 8, 0.0),   # decoder_5: [batch, 16, 16, ngf * 8 * 2] => [batch, 32, 32, ngf * 4 * 2]
(a.ngf * 4, 0.0),   # decoder_4: [batch, 32, 32, ngf * 4 * 2] => [batch, 64, 64, ngf * 2 * 2]
(a.ngf * 2, 0.0),   # decoder_3: [batch, 64, 64, ngf * 2 * 2] => [batch, 128, 128, ngf * 2]
(a.ngf, 0.0),       # decoder_2: [batch, 128, 128, ngf * 2 * 2] => [batch, 512, 512, ngf * 2]
]`
ghost commented 6 years ago

Hi @groot-1313. As you, I would like to use pix2pix with 512x512 images. Can you please recap all the code modification that you done? Thank you very much.

alelordelo commented 5 years ago

@groot-1313 Yes! I used it for Face2face: https://github.com/karolmajek/face2face-demo Pix2Pix 1024 here: https://github.com/karolmajek/pix2pix-tensorflow

Hi @karolmajek , is it possible to use 32bits images on your HD fork?

karolmajek commented 5 years ago

original author uses TF to list images: https://github.com/karolmajek/pix2pix-tensorflow/blob/master/pix2pix.py#L262

What do you mean by 32bit? Input is jpg, 3 channels, do you need RGBA?

alelordelo commented 5 years ago

original author uses TF to list images: https://github.com/karolmajek/pix2pix-tensorflow/blob/master/pix2pix.py#L262

What do you mean by 32bit? Input is jpg, 3 channels, do you need RGBA?

Thanks for helping @karolmajek

I need to input and output 32 bits float as EXR or TIFF files, because the kind of output I am looking for is a displacement map (to be used in 3D graphics materials) that only works with 32bits dynamics range,

giltee commented 5 years ago

Guys this is really cool stuff, I am interested to know if anyone has been able to achieve a setup that doesn't change the height / width ratio at all. As an example if I send images in that are 1024x720 can I get images back of the same dimension?

karolmajek commented 5 years ago

@giltee if your input is 1024x720 you can scale it to 1024x1024 and then scale down to 1024x720 Resolution is a power of 2 because of scaling by 2 up/down in every layer Scaling should work in your case

kex243 commented 5 years ago

Hi, thanks a lot karolmajek! It is wonderful thing to play with! I'm kind a not programist, but managed to run the code. And now Im trying to work with pairs of original picture and depthmap, and I have a question about layers and weights. If the code here layer_specs = [ a.ngf * 2, # encoder_2: [batch, 256, 256, ngf] => [batch, 128, 128, ngf * 2] a.ngf * 4, # encoder_3: [batch, 128, 128, ngf] => [batch, 64, 64, ngf * 2] a.ngf * 8, # encoder_4: [batch, 64, 64, ngf * 2] => [batch, 32, 32, ngf * 4] a.ngf * 8, # encoder_5: [batch, 32, 32, ngf * 4] => [batch, 16, 16, ngf * 8] a.ngf * 8, # encoder_6: [batch, 16, 16, ngf * 8] => [batch, 8, 8, ngf * 8] a.ngf * 8, # encoder_7: [batch, 8, 8, ngf * 8] => [batch, 4, 4, ngf * 8] a.ngf * 8, # encoder_8: [batch, 4, 4, ngf * 8] => [batch, 2, 2, ngf * 8] a.ngf * 16, # encoder_9: [batch, 2, 2, ngf * 8] => [batch, 1, 1, ngf * 8] ]

layer_specs = [ (a.ngf * 16, 0.5), # decoder_9: [batch, 1, 1, ngf * 8] => [batch, 2, 2, ngf * 8 * 2] (a.ngf * 8, 0.5), # decoder_8: [batch, 2, 2, ngf * 8 * 2] => [batch, 4, 4, ngf * 8 * 2] (a.ngf * 8, 0.5), # decoder_7: [batch, 4, 4, ngf * 8 * 2] => [batch, 8, 8, ngf * 8 * 2] (a.ngf * 8, 0.5), # decoder_6: [batch, 8, 8, ngf * 8 * 2] => [batch, 16, 16, ngf * 8 * 2] (a.ngf * 8, 0.0), # decoder_5: [batch, 16, 16, ngf * 8 * 2] => [batch, 32, 32, ngf * 4 * 2] (a.ngf * 4, 0.0), # decoder_4: [batch, 32, 32, ngf * 4 * 2] => [batch, 64, 64, ngf * 2 * 2] (a.ngf * 2, 0.0), # decoder_3: [batch, 64, 64, ngf * 2 * 2] => [batch, 128, 128, ngf * 2] (a.ngf, 0.0), # decoder_2: [batch, 128, 128, ngf * 2 * 2] => [batch, 512, 512, ngf * 2] ] is responsible for output, can I modify it some way to use it in more complicated tasks? For example, calculating the depth maps for many types of objects instea of one or two- shall I enlarge the number of layers in the way `layer_specs = [ (a.ngf 16, 0.5), # decoder_9: [batch, 1, 1, ngf 8] => [batch, 2, 2, ngf 8 2] (a.ngf 8, 0.5), # decoder_8: [batch, 2, 2, ngf 8 2] => [batch, 4, 4, ngf 8 2] (a.ngf 8, 0.5), # decoder_7: [batch, 4, 4, ngf 8 2] => [batch, 8, 8, ngf 8 2] (a.ngf 8, 0.5), # decoder_6: [batch, 8, 8, ngf 8 2] => [batch, 16, 16, ngf 8 2] (a.ngf 8, 0.0), # decoder_5: [batch, 16, 16, ngf 8 2] => [batch, 32, 32, ngf 4 2] (a.ngf 4, 0.0), # decoder_4: [batch, 32, 32, ngf 4 2] => [batch, 64, 64, ngf 2 2] (a.ngf 2, 0.0), # decoder_3: [batch, 64, 64, ngf 2 2] => [batch, 128, 128, ngf * 2]

here it is

(a.ngf, 0.0),       # decoder_2: [batch, 128, 128, ngf * 2 * 2] => [batch, 512, 512, ngf * 2]

(a.ngf, 0.0),       # decoder_2: [batch, 128, 128, ngf * 2 * 2] => [batch, 512, 512, ngf * 2]

]` or somehow other way. Also, now im running the dumb code with changed parameters of weights (as I understood)- (a.ngf 16, 0.2) (a.ngf 8, 0.3) (with or without changing it in both columns) etc and trying to see the difference in output. Is it "weight" coeficients? Will it have effect or it is nonsense? To sum up, is the code flexible to run hard tasks, not just facades, and how to change it to increase or decrease the output precision and save 1024 to 1024 resolution? If evaluating time is not the problem.

ZhuoerLyu commented 5 years ago

@karolmajek Hi, Thanks for your suggestions on the size problems. I followed your advice and I made it work. But now I encountered another problem about training cost.

My dataset has 2700 10241024 images, and I found it takes forever to finish training (121 days!). Since your image was also 10241024, I was confused about how you solved the training cost issue.

By the way, I followed you on Medium and the project of face2face is really interesting!

skabbit commented 4 years ago

@karolmajek why don't you change discriminator layers? I suppose this will reduce the resulting quality of the images generated.