SeitaroShinagawa / chainer-partial_convolution_image_inpainting

Reproduction of Nvidia image inpainting paper "Image Inpainting for Irregular Holes Using Partial Convolutions"
MIT License
114 stars 30 forks source link

I'm sorry to bother you again,but I have some questions in finetune #28

Closed erhuodaosi closed 5 years ago

erhuodaosi commented 5 years ago

Thanks a lot for your excellent work.I have another question for this code,how can we finetune the network in chainer?The original paper seems to use different learning rate(0.00005) and freeze the Batch Normalization in the encoder part of the network ,and in this way,they reduce the color differenes(it means the L1 loss outside the hole).I would appreciate if you could help me,thank you

SeitaroShinagawa commented 5 years ago

Hi,

To start fine-tuning with the different learning rate, you can do like this:

python train.py -g 0 --learning_rate 0.00005 --load_model <your pre-trained model path>

I think we need to modify common/net.py to freeze the batchnormalization of the encoder.
You can use chainer.using_config. For example, like this:

# This is just an example
class Model(chainer.Chain):
    def __init__(self):
        super(Model, self).__init__()
        with self.init_scope():
            self.enc = L.Linear(4,3)
            self.enc_bn = L.BatchNormalization(3)
            self.dec = L.Linear(3,4)
            self.dec_bn = L.BatchNormalization(4)

    def __call__(self, x, enc_freeze=False, finetune=False):
        h = self.enc(x)
        if enc_freeze:
            with chainer.using_config('train', False):
                h = self.enc_bn(h)
        else:
            h = self.enc_bn(h, finetune=finetune)
        h = F.relu(h)
        h = self.dec(h)
        y = self.dec_bn(h)
        return y

If enc_freeze=True, all batchnormalizations of the encoder work in test mode.

Additionally, I recommend you to try finetune=True in the first epoch (e.g. see the example).
Batchnormalization in chainer uses mini-batch mean and variance in training.
To caluculate precise statistics, you can use finetune=True in the training mode when you call batchnormalization.
After the epoch, make enc_freeze=True. It makes the encoder work with precise statistics.

Best,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Thank you very much for your constant help!

First of all,python train.py -g 0 --learning_rate 0.00005 --load_model By this way could I resume training on my existing model?that is to say,500000 iterations for training the model without finetune,and another 500000 iterations for resuming training the model based on model500000. npz?

furthermore,why we resize the picture(256 256) to ( 280 336),and do random crop,get the 256 256 area from the 280 336 picture?could we miss some features by cropping the picture?As is shown in the original paper ,we should have done some augmentations,however the mask provided and the training dataset is enough big to train the model,such flip,resize,or crop ,so what is the meaning to do the data augmentations? During the dataset preprocess stage,how could we loda the test dataset in order rather than get a random picture from the dataset in chainer?As we could see in the net.py,what is the meaning of add_noise(h, test, sigma=0.2)?Forgive the fact that I'm so silly.

Last but not the least,I'm not good at programming,but I found your advice provided is very userful,I would have a try in your way.

Thank you very much!

SeitaroShinagawa commented 5 years ago

To the first question: yes, load your existing trained model (i.e. model500000. npz) and resume training with enc_freeze=True modification in common/net.py.

To the second question: sorry for confusing you. Random cropping is just for augmentation, but it might be not essential as you said. There is no strong reason. One weak reason is that flipping or cropping is popular way, usually makes result better and rarely makes result worse. Moreover, my code is based on chainer-cyclegan as I wrote in acknoledgement. I had no strong reason to remove these augmentation processing when I implemented this code.

For the same reason, add_noise is also the option derived from chainer-cyclegan. You can ignore it in this task. It would be effective if we apply this PartialConv network to another task e.g. GAN for image generation. Its motivation is just extendability.

To load test data from test dataset in order, rewite datasets/place2.py L.94

    def get_example(self, i):
        np.random.seed(None)
        #idA = self.trainAkey[np.random.randint(0,len(self.trainAkey))] # comment out
        idA = self.trainAkey[i] # add new

        #idM = self.maskkey[np.random.randint(0,len(self.maskkey))]   # comment out
        idM = self.maskkey[i] # add new

However, it is my fault. I fotgot to correct this part. Thanks for your report! (I have already fixed this part)

erhuodaosi commented 5 years ago

@SeitaroShinagawa I‘m more thankful for your help than I can express.

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

I'm sorry to bother you again, but I have some questions in I_out,I'm very confused. I think I_out should be I_out(1-mask)+gtmask rather than I_out(model(x,m)),I wonder if I was wrong. There are some differences between the known area of I_out and I_gt.On the one hand,we do not use the finetune,on the other hand,I think maybe I_out should be I_out(1-mask)+gtmask.But when I modify my code,I get worse result.

original code: img1 = x.get()

img2 = batch_postprocess_images(img1, 1, 1)
Image.fromarray(img2).save(args.eval_folder + "/generated_3_Igt.jpg")

img3 = M.data.get()

img4 = batch_postprocess_images(img3, 1, 1)
Image.fromarray(img4).save(args.eval_folder + "/generated_0_mask.jpg")

img = I_comp.data.get()

img = batch_postprocess_images(img, 1, 1)
Image.fromarray(img).save(args.eval_folder + "/generated_2_Icomp.jpg")

img5 = I_out.data.get()
img5 = batch_postprocess_images(img5, 1, 1)
Image.fromarray(img5).save(args.eval_folder + "/generated_1_Iout.jpg")

the modified code: img1 = x.get()

img2 = batch_postprocess_images(img1, 1, 1)
Image.fromarray(img2).save(args.eval_folder+"/generated_3_Igt.jpg")

img3 = M.data.get()

img4 = batch_postprocess_images(img3, 1, 1)
Image.fromarray(img4).save(args.eval_folder + "/generated_0_mask.jpg")

img = I_comp.data.get()

img = batch_postprocess_images(img, 1, 1)
Image.fromarray(img).save(args.eval_folder+"/generated_2_Icomp.jpg")

img5 = I_out.data.get()

2019-04-12 11-17-11屏幕截图 2019-04-12 11-15-59屏幕截图

the model is pretrained without modifying evaluation.py. I would appreciate if you could help me,thank you!

Best,

SeitaroShinagawa commented 5 years ago

Let me correct a misunderstanding.
Actually, "I_out" is the output of completion network: I_out = model(x, mask) (in the training, x is I_gt)
"I_comp" represents "I_out(1-mask)+gtmask" as you mentioned. (see here)

        I_out = self.model(I_gt,M)  
        I_comp = F.where(M_b,I_gt,I_out) # I_comp = M_b * I_gt + (1-M_b) * I_out
erhuodaosi commented 5 years ago

@SeitaroShinagawa Wow!Thank you for your timely response!please forgive my fault! I have changed the code as follow: I_comp = F.where(M_b, I_gt, Variable(xp.ones((batchsize, 3, image_size, image_size)).astype("f"))) I have mistaken the meaning of the I_comp at the beginning and take it for granted all the time! It is my fault!Thank you very much!

SeitaroShinagawa commented 5 years ago

Yes, final output is I_comp. Sorry for confusing you.
I_comp, I_out, and I_gt are derived from original paper. You can check them in section 3.3.

Best,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Thank you for informing me of reading the original paper in section 3.3.Thanks for your constant help!

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

I'm sorry to bother you again,but I have some questions in train.py about train_iter and val_iter. In chainer-cyclegan,there is SZ3D@DV0U)ES} O%_AY4E~0 In your code,there is KW(~G6$Y2SR@1UA56@D08R7 should val_iter be the same form of chainer-cyclegan?

In net.py,256256 becomes 22 eventually,why should we can not add one more layer as the original paper (although the original paper use the 512512 and 8 layers) to let it become 11?

I would appreciate if you could help me,thank you!

Best,

SeitaroShinagawa commented 5 years ago

should val_iter be the same form of chainer-cyclegan?

No. chainer-cyclegan makes test_iter from train dataset but my val_iter is made from validation dataset of place2.

why should we can not add one more layer as the original paper

First, it is because the higher resolution takes much time and requires a large amount of GPU resource. Second, place2 provides 256x256 resolution dataset.

Therefore, It is easy to try as the first step.
Of course, you can try to the higher resolution by modifying net.py. Unfortunately, I have not enough time to try that for a few months, so I will appreciate if you could give me a result!

Thanks,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

Thank you for your timely reply! As for the second question,I mean that could we add one more layer(8 layers) to inpant for 256 256 images as the original paper.why we reduce a layer which is different from the original paper(due to 8 layers for 512 512 images?)I am willing to try it on 512*512 resolution dataset at my leisure time and show the resluts here.

During my training stage, I have met such question,how could I solve it? Here is the error: 2019-04-16 09-28-41屏幕截图 2019-04-16 09-28-57屏幕截图 I would appreciate if you could help me,thank you!

Thanks,

SeitaroShinagawa commented 5 years ago

Oh, this error seems a kind of your hardware error.
You should check your disk space and the number of files you have by df -h and df -i (in detail, try to search with the error message, "No space left on device").

Best,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

Thank you for your timely reply!Thanks a lot! I would have a try in your way or training it on server.

Thanks,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

Thank you! I have solved this question by cleaning up the recycle bin,maybe the error occurred by hundreds of generated npz model file.The server went wrong a few days ago, and now it's back to normal. I am planning to retrain it again,and run the code based on the finetune network as the original paper.

Thank you sincerely for your constant help!Please forgive the fact I am such a silly person. I have not found a good way to improve the network so far. I am too embarrassed to speak out it,but I want to get some inspirations from you.It's very difficult to say,maybe what I have done may go hard with you,but I want to get the help,really,really...

I would very appreciate if you have some ideas on how to improve the network.To be honest,whether you have the iead or not,I would very appreciate,you have helped me a lot after all.

Thanks,

SeitaroShinagawa commented 5 years ago

Congrats! I'm pleased that you are interested in my code and sparing much effort to try it even though my code has a lot of confusing part.

To improve the network, I think, one of the promising ways is hyper-parameter tuning by hyperopt or optuna, for example.
Second, you can combine this network with Generative Adversarial Networks. It may lead the result great but training more difficult.
Third, you can replace Partial Convolution with Gated Convolution, which is an expansion of Partial Convolution.

Free-Form Image Inpainting with Gated Convolution
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, Thomas S. Huang
https://arxiv.org/abs/1806.03589

When I tried Gated Convolution in private, It outperformed Partial Convolution in the all of my experiment.

I hope it would help you.
Thanks,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

Thank you for your timely response! the advice you put forward to improve the network is wonderful.I think it's worth trying in your way.

Thank you very much for your constant help!Really,Really... Best wishes to you!

Thanks,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

long time no see! I'm sorry to bother you again,but I have some questions in how to get the edge sketch file from the places2 dataset,I wonder how to use canny or HED to generate such edge files.Does it cost a long time to get those edge file?

I would appreciate if you could help me,thank you!

Thanks,

SeitaroShinagawa commented 5 years ago

Hi, @erhuodaosi

I have never tried canny or HED yet. So I'm not sure how long does it take to process the whole dataset of Place2.
However, both canny and HED are promising methods for edge detection.

Those resources would help you: Canny edge detection

HED

I think python for canny or pytorch for HED should be easy to run.

I hope your project goes well.
Best,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

Thank you for your timely reply!Thanks for your advice!

Have you tested gated conv with sketch input? It seems to use incomplete picture,mask and sketch as input.Due to the lack of those edge flist,I replace edge input with zeros matrix.I have tested gated conv without edge input,it outperformed Partial Convolution indeed.I have also tried HED Caffe implementation,but there is some wrong while training.

I would try to get the sketch in your advice,thank you very much!

Thanks,

erhuodaosi commented 5 years ago

@SeitaroShinagawa Hi,

I'm sorry to bother you again ,but I have some questions about deepfillv2.Here is the code for generating mask using free_form_mask.But how to save batch_mask?I want to get the mask dataset generated by free-form mask,and put mask as input.

def free_form_mask(xp, batchsize, size=(256, 256), maxVertex=20, minLength=50, maxLength=200, minBrushWidth=10, maxBrushWidth=40, maxAngle=20): """Generate free-form mask tensor """ imageHeight, imageWidth = size mask = np.zeros((imageHeight, imageWidth), dtype="float32") numVertex = int(random.uniform(2, maxVertex)) startX = int(random.uniform(0, imageWidth - 1)) startY = int(random.uniform(0, imageHeight - 1)) for i in range(numVertex): angle = random.uniform(-maxAngle, maxAngle) if i % 2 == 0: angle = 180 - angle length = random.uniform(minLength, maxLength) brushWidth = int(random.uniform(minBrushWidth, maxBrushWidth)) endX = np.clip(startX + int(length np.sin(np.deg2rad(angle))), 0, imageWidth) endY = np.clip(startY + int(length np.cos(np.deg2rad(angle))), 0, imageHeight) cv2.line(mask, (startX, startY), (endX, endY), 255, brushWidth) startX = endX startY = endY mask = mask.reshape(1, 1, imageHeight, imageWidth) mask = np.tile(mask, (batchsize, 1, 1, 1)) # same masks for all images return xp.array(mask.reshape(batchsize, 1, imageHeight, imageWidth), dtype="float32") / 255.

I would very appreciate if you could help me!Thank you! Thank you very much!

Thanks,