Question about ffhq_encode

justmaulik commented 3 years ago

I tried to train a toon model with source/target images as below Source source Target target Result out Trained it till like 6000 iters as mentioned in the thread with the same settings above is just example trained the data with like 1000 images. It does not really give the expected output it actually changes the structure of the whole face. can you give some rough idea of what I could be doing wrong? and how do I preserve the input face and somehow make the eyes a bit large?

yuval-alaluf commented 3 years ago

I ran the source image you linked above through the toonified model we uploaded and I got a similar result to what you got using your trained model. Therefore, I don't think you're doing anything wrong.
I will say that our toonified model is not trained on pairs of (real, toon) and therefore we run for a small number of steps. If you have paired data, you may find that running for more iterations will improve the results. Regarding making the eyes bigger, this all depends on the data you use for training. If your target toons have larger eyes, you are more likely to generate toons with larger eyes. Our toonify model tends not to produce smaller eyes since we did not use paired data.
I hope this helps.

justmaulik commented 3 years ago

I tried to run it for a longer time but the result did not improve after 27k steps (best) you can check the training run here. And as you mentioned that your toon model is not trained on pairs then how does the training works? can I train the model by just providing target images? in path config it goes like this:

'ffhq_train': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Train/Train_IN',
  'ffhq_target': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Train/Target_IN',
  'toon_train': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Test/Test_IN',
  'toon_target': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Test/Target_IN',

In the data config file we provided details like this:

'ffhq_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['ffhq_train'],
        'train_target_root': dataset_paths['ffhq_target'],
        'test_source_root': dataset_paths['toon_train'],
        'test_target_root': dataset_paths['toon_target'],
    },

yuval-alaluf commented 3 years ago

Ah. I think I see the problem now. 1) Based on the logs you linked, it appears that you are trying to encode toon images. That is, you're training using pairs of (toon, toon). However, if I understand you correctly, you want to train using pairs of (real, toon). That is, the training source data should be real face images while the training target data should be the corresponding toon images. 2) The data config you specified above seems a bit strange if I understood it correctly. I believe what you want is something like:

'ffhq_encode': {
        'transforms': transforms_config.EncodeTransforms,
        'train_source_root': dataset_paths['ffhq_train'],
        'train_target_root': dataset_paths['ffhq_toons_train'],
        'test_source_root': dataset_paths['real_test'],
        'test_target_root': dataset_paths['toons_test'],
    },

where ffhq_toons_train is the path to the toons images corresponding to the FFHQ data, real_test is the path to the test set containing real faces images and toons_test is the path to the test set containing the corresponding toons images.

As for how we trained our toons model with no paired data: we train our toons model exactly like the ffhq_encode task but replace the StyleGAN model from FFHQ to the StyleGAN toons model. That is, we train using only real face images.

justmaulik commented 3 years ago

I think the structure you described in point 2 it's same for us while doing the above training. below structure is what I mentioned in the data config file which I am pretty sure it should be correct or I am doing something wrong?
1. Images in logs seem confusing as input is toon image whereas it should be a real image in the input isn't it?

yuval-alaluf commented 3 years ago

Yes. Now I understand what your data paths are and looks good.
You are correct regarding the logs not being correct. I think I know what the issue is. I will look into it and explain in a bit 😄

yuval-alaluf commented 3 years ago

Ok.
First, notice that you are using transforms_config.FFHQEncodeTransforms which defines transform_source equal to None. Now, refer to ImagesDataset: https://github.com/eladrich/pixel2style2pixel/blob/89935c49f9187e632148dc62a902ab3dfbe4205d/datasets/images_dataset.py#L28-L31 You will notice that if self.source_transform is None, we set from_im = to_im in line 31. And since transform_source is defined as None, we fall exactly into that case.

Therefore, you're actually not using the paired data you defined! That is, you're "over-writing" the source data with the target data. That is why in your logs, you see only the toons images (since those are your target images).

A couple of questions you may have:

Why did we do it this way? Because in ffhq_encode, the task at hand is to "reconstruct" the images. Therefore, here it makes sense to set from_im = to_im.
Why did we tell you to use ffhq_encode for the toonify task? Because as I mentioned above, we don't use paired toons data. We simply use the real FFHQ data and therefore it was fine to set the source images equal to the target images.

However, this is not what you want to do. What you need to do is define your own transform class where you're transforms_dict is something like:

transforms_dict = {
            'transform_gt_train': transforms.Compose([
                transforms.Resize((256, 256)),
                transforms.ToTensor(),
                transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
            'transform_source': transforms.Compose([
                transforms.Resize((256, 256)),
                transforms.ToTensor(),
                transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
            'transform_test': transforms.Compose([
                transforms.Resize((256, 256)),
                transforms.ToTensor(),
                transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
            'transform_inference': transforms.Compose([
                transforms.Resize((256, 256)),
                transforms.ToTensor(),
                transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])
}

This way, transform_source is not None and you do not set from_im = to_im in ImagesDataset.

Then, you can make a new dataset type in data_configs.py. Something like:

'toonify': {
        'transforms': transforms_config.ToonifyTransforms,
        'train_source_root': dataset_paths['real_train'],
        'train_target_root': dataset_paths['toons_train'],
        'test_source_root': dataset_paths['real_test'],
        'test_target_root': dataset_paths['toons_test'],
},

And call your training script using --dataset_type=toonify.

To make sure everything works as expected, you should see in the logs the real source image next to the toons target image, followed by the output toons image.

Very long answer, but I hope it helps clear up any confusion. Let me know if you have anything further questions. 😄

justmaulik commented 3 years ago

Thanks @yuval-alaluf after changing transform class and the toonify dict as you described now the logs are as expected. and the other thing I was wondering was how can I try with just stylegan weights file(.pt) and without pair images as u did in your toon model.

justinpinkney commented 3 years ago

Would be very interested to see how the results of your training run turn out!

yuval-alaluf commented 3 years ago

@justmaulik , Not sure what you mean by "just stylegan weights file".
But if you want to reproduce the toons model we trained with no paired data, you need to specify only a few things:

Make sure --dataset_type is ffhq_encode
- this will take the data to simply be real FFHQ face images with no toons paired data.
Set --stylegan_weights to the file containing the toonified StyleGAN generator that we linked in the repo (here is the link to download it
Set the loses to what we specified in the README.

Basically, the main difference is that instead of using the FFHQ StyleGAN generator, you will use the Toons StyleGAN generator.

justmaulik commented 3 years ago

Thanks, @yuval-alaluf actually have a toon StyleGAN model that was created by following @justinpinkney blog post that actually turned out really good but not perfect (10% bad image). I was curious to see how the result changes in pixel2style2pixel.

I did the training and the Logs are getting better than the previous runs. but it's not the learning anymore the best loss was achieved at the start of the training then it just stayed still. The current dataset count is ~920 pairs in train and 100 pairs in the test.

Is the training data now enough?
I am using a StyleGAN model that I mentioned above as --stylegan_weights
The images in test and train data are generated using the same stylegan_weights
The output in the test folder looks like it just gives the same few faces just a bit changes and is way far from the input.

Waiting for the follow-up post on @justinpinkney blog :) or if possible you could explain a bit about how did you manage to achieve the results so I can give that method a try. 😊

yuval-alaluf commented 3 years ago

Looks much better on the train! I think the problem you're getting on the test set is that the real image data is not aligned. Other than that, everything else looks good. I believe that if you align the data you should see much better results. Let me know if this solves the problem. I'm interested in seeing your results!

justmaulik commented 3 years ago

All the images used were tagged as aligned on Nvidia's repo. I think it's just missing the blur effect. Still, I will start a new Run after aligning all the images and will share the results. The output in the train folder is perfect but in the test folder, it's still a bit far from perfect. If I add more variety of data pairs(5k-10k pairs) will it improve significantly? or 900 pairs is more than enough for this type of training?

yuval-alaluf commented 3 years ago

I'm pretty sure the test data is not aligned. There are a lot of images that are very rotated with respect to the corresponding toon image.
I would run realignment again on the test to make sure. The train data seems fine.
Notice that you don't need to retrain the model. Simply take the trained model and run inference on the test data after realignment.
Regarding adding more data, I wouldn't run to add more data before you know that alignment isn't the problem.

justmaulik commented 3 years ago

I tried aligned images but it did not improve the results as expected it's more random some images give good results but most are just like this. the output is good but far from the input. I will try with a larger dataset and share the results.

yuval-alaluf commented 3 years ago

Sounds good. Since you also have paired data, I invite you to play around with the loss lambdas. You may be able to get better results with a different combination. Specifically, I would recommend starting by decreasing the w_norm_lambda a bit (maybe to 0.01?).
In any case, I feel like we can close this issue as it seems like we've solved the issue you had. Feel free to reopen the issue if needed and I am looking forward to seeing your results!

justmaulik commented 3 years ago

@justmaulik Were you able to solve this? I am facing the same issue, I trained with 3k pairs, but still the output does not exactly match with the input

No, I tried few things but did not achieve the results I was aiming for. at some point, I trained with ~20k pairs (40k images) but the results did not improve much.

Nerdyvedi commented 3 years ago

@yuval-alaluf Can we reopen this issue? I am facing the same problem. The results look good on train logs, but with test, it seems to have no connection with the input.

yuval-alaluf commented 3 years ago

Hi @Nerdyvedi , I'll be happy to help, but since your question is not specifically about the ffhq_encode task, I think the best thing to do is to open a new issue where you can provide me some more details as to what you're trying to do, the training setting, and the results your seeing on the train/test. I will try to help out as much as possible.

cvmlddev commented 3 years ago

@yuval-alaluf Would it be possible for you to share your dataset? Your dataset looks much better than ours. How did you create it? I am close to getting good results, But my dataset is not good enough

yuval-alaluf commented 3 years ago

Hi @cvmlddev, As I mentioned in your issue (#83), I think part of the reason you're getting unsatisfactory results is the use of paired data.
Regarding the data, for our toonify model we actually didn't use any paired data at all. The training was done using only real images from FFHQ using the toonify StyleGAN. This allowed for more flexible results during inference.

cvmlddev commented 3 years ago

Oops sorry, meant to tag @justmaulik . @justmaulik Would it be possible for you to share your paired toon dataset?

justmaulik commented 3 years ago

@cvmlddev Sure I had 20k pairs (40k images) but can't find it now will share it if I find it somewhere. Here is the 1k set I just found.

cvmlddev commented 3 years ago

@justmaulik Thanks a lot! How did you generate it? Also, I wonder why you are not getting good results, I was able to get decent results with default parameters.

justmaulik commented 3 years ago

I trained an ffhq model with a small dataset then created paired data with it to train for pixel2style2pixel. the problem I faced was results were not close to the input results were good but far from the input. Here I found just toon folder :) maybe it can help you somehow.

cvmlddev commented 3 years ago

Thanks a lot!

saschaglo commented 2 years ago

@justmaulik again, one year later, did you get better results in the end or did you end up working with a completely another approach?

leslie-ds commented 2 years ago

How to generate cartoon images like https://drive.google.com/drive/folders/1-6gZXiSDwT8hJxJcwqdGd2qErm-608rL , THX

watertianyi commented 1 year ago

@cvmlddev当然我有 20k 对（40k 图像）但现在找不到它如果我在某个地方找到它会分享它。这是我刚找到的 1k 集。

Hi, can you share your dataset again, can't find it here, thanks!

eladrich / pixel2style2pixel

Question about ffhq_encode #21