Closed justmaulik closed 3 years ago
I ran the source image you linked above through the toonified model we uploaded and I got a similar result to what you got using your trained model. Therefore, I don't think you're doing anything wrong.
I will say that our toonified model is not trained on pairs of (real, toon) and therefore we run for a small number of steps. If you have paired data, you may find that running for more iterations will improve the results.
Regarding making the eyes bigger, this all depends on the data you use for training. If your target toons have larger eyes, you are more likely to generate toons with larger eyes. Our toonify model tends not to produce smaller eyes since we did not use paired data.
I hope this helps.
I tried to run it for a longer time but the result did not improve after 27k steps (best) you can check the training run here. And as you mentioned that your toon model is not trained on pairs then how does the training works? can I train the model by just providing target images? in path config it goes like this:
'ffhq_train': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Train/Train_IN',
'ffhq_target': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Train/Target_IN',
'toon_train': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Test/Test_IN',
'toon_target': '/content/drive/My Drive/Style/pixel2style2pixel/Data/Test/Target_IN',
In the data config file we provided details like this:
'ffhq_encode': {
'transforms': transforms_config.EncodeTransforms,
'train_source_root': dataset_paths['ffhq_train'],
'train_target_root': dataset_paths['ffhq_target'],
'test_source_root': dataset_paths['toon_train'],
'test_target_root': dataset_paths['toon_target'],
},
Ah. I think I see the problem now. 1) Based on the logs you linked, it appears that you are trying to encode toon images. That is, you're training using pairs of (toon, toon). However, if I understand you correctly, you want to train using pairs of (real, toon). That is, the training source data should be real face images while the training target data should be the corresponding toon images. 2) The data config you specified above seems a bit strange if I understood it correctly. I believe what you want is something like:
'ffhq_encode': {
'transforms': transforms_config.EncodeTransforms,
'train_source_root': dataset_paths['ffhq_train'],
'train_target_root': dataset_paths['ffhq_toons_train'],
'test_source_root': dataset_paths['real_test'],
'test_target_root': dataset_paths['toons_test'],
},
where ffhq_toons_train
is the path to the toons images corresponding to the FFHQ data, real_test
is the path to the test set containing real faces images and toons_test
is the path to the test set containing the corresponding toons images.
As for how we trained our toons model with no paired data: we train our toons model exactly like the ffhq_encode task but replace the StyleGAN model from FFHQ to the StyleGAN toons model. That is, we train using only real face images.
I think the structure you described in point 2 it's same for us while doing the above training. below structure is what I mentioned in the data config file which I am pretty sure it should be correct or I am doing something wrong?
Ok.
First, notice that you are using transforms_config.FFHQEncodeTransforms
which defines transform_source
equal to None
. Now, refer to ImagesDataset
:
https://github.com/eladrich/pixel2style2pixel/blob/89935c49f9187e632148dc62a902ab3dfbe4205d/datasets/images_dataset.py#L28-L31
You will notice that if self.source_transform
is None
, we set from_im = to_im
in line 31. And since transform_source
is defined as None
, we fall exactly into that case.
Therefore, you're actually not using the paired data you defined! That is, you're "over-writing" the source data with the target data. That is why in your logs, you see only the toons images (since those are your target images).
A couple of questions you may have:
ffhq_encode
, the task at hand is to "reconstruct" the images. Therefore, here it makes sense to set from_im = to_im
. ffhq_encode
for the toonify task? Because as I mentioned above, we don't use paired toons data. We simply use the real FFHQ data and therefore it was fine to set the source images equal to the target images. However, this is not what you want to do. What you need to do is define your own transform class where you're transforms_dict
is something like:
transforms_dict = {
'transform_gt_train': transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
'transform_source': transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
'transform_test': transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])]),
'transform_inference': transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])
}
This way, transform_source
is not None
and you do not set from_im = to_im
in ImagesDataset
.
Then, you can make a new dataset type in data_configs.py
. Something like:
'toonify': {
'transforms': transforms_config.ToonifyTransforms,
'train_source_root': dataset_paths['real_train'],
'train_target_root': dataset_paths['toons_train'],
'test_source_root': dataset_paths['real_test'],
'test_target_root': dataset_paths['toons_test'],
},
And call your training script using --dataset_type=toonify
.
To make sure everything works as expected, you should see in the logs the real source image next to the toons target image, followed by the output toons image.
Very long answer, but I hope it helps clear up any confusion. Let me know if you have anything further questions. 😄
Thanks @yuval-alaluf after changing transform class and the toonify dict as you described now the logs are as expected. and the other thing I was wondering was how can I try with just stylegan weights file(.pt) and without pair images as u did in your toon model.
Would be very interested to see how the results of your training run turn out!
@justmaulik ,
Not sure what you mean by "just stylegan weights file".
But if you want to reproduce the toons model we trained with no paired data, you need to specify only a few things:
--dataset_type
is ffhq_encode
--stylegan_weights
to the file containing the toonified StyleGAN generator that we linked in the repo (here is the link to download itBasically, the main difference is that instead of using the FFHQ StyleGAN generator, you will use the Toons StyleGAN generator.
Thanks, @yuval-alaluf actually have a toon StyleGAN model that was created by following @justinpinkney blog post that actually turned out really good but not perfect (10% bad image). I was curious to see how the result changes in pixel2style2pixel.
I did the training and the Logs are getting better than the previous runs. but it's not the learning anymore the best loss was achieved at the start of the training then it just stayed still. The current dataset count is ~920 pairs in train and 100 pairs in the test.
--stylegan_weights
stylegan_weights
Waiting for the follow-up post on @justinpinkney blog :) or if possible you could explain a bit about how did you manage to achieve the results so I can give that method a try. 😊
Looks much better on the train! I think the problem you're getting on the test set is that the real image data is not aligned. Other than that, everything else looks good. I believe that if you align the data you should see much better results. Let me know if this solves the problem. I'm interested in seeing your results!
All the images used were tagged as aligned on Nvidia's repo. I think it's just missing the blur effect. Still, I will start a new Run after aligning all the images and will share the results. The output in the train folder is perfect but in the test folder, it's still a bit far from perfect. If I add more variety of data pairs(5k-10k pairs) will it improve significantly? or 900 pairs is more than enough for this type of training?
I'm pretty sure the test data is not aligned. There are a lot of images that are very rotated with respect to the corresponding toon image.
I would run realignment again on the test to make sure. The train data seems fine.
Notice that you don't need to retrain the model. Simply take the trained model and run inference on the test data after realignment.
Regarding adding more data, I wouldn't run to add more data before you know that alignment isn't the problem.
I tried aligned images but it did not improve the results as expected it's more random some images give good results but most are just like this. the output is good but far from the input. I will try with a larger dataset and share the results.
Sounds good. Since you also have paired data, I invite you to play around with the loss lambdas. You may be able to get better results with a different combination. Specifically, I would recommend starting by decreasing the w_norm_lambda
a bit (maybe to 0.01?).
In any case, I feel like we can close this issue as it seems like we've solved the issue you had. Feel free to reopen the issue if needed and I am looking forward to seeing your results!
@justmaulik Were you able to solve this? I am facing the same issue, I trained with 3k pairs, but still the output does not exactly match with the input
No, I tried few things but did not achieve the results I was aiming for. at some point, I trained with ~20k pairs (40k images) but the results did not improve much.
@yuval-alaluf Can we reopen this issue? I am facing the same problem. The results look good on train logs, but with test, it seems to have no connection with the input.
Hi @Nerdyvedi ,
I'll be happy to help, but since your question is not specifically about the ffhq_encode
task, I think the best thing to do is to open a new issue where you can provide me some more details as to what you're trying to do, the training setting, and the results your seeing on the train/test.
I will try to help out as much as possible.
@yuval-alaluf Would it be possible for you to share your dataset? Your dataset looks much better than ours. How did you create it? I am close to getting good results, But my dataset is not good enough
Hi @cvmlddev,
As I mentioned in your issue (#83), I think part of the reason you're getting unsatisfactory results is the use of paired data.
Regarding the data, for our toonify model we actually didn't use any paired data at all. The training was done using only real images from FFHQ using the toonify StyleGAN. This allowed for more flexible results during inference.
Oops sorry, meant to tag @justmaulik . @justmaulik Would it be possible for you to share your paired toon dataset?
@cvmlddev Sure I had 20k pairs (40k images) but can't find it now will share it if I find it somewhere. Here is the 1k set I just found.
@justmaulik Thanks a lot! How did you generate it? Also, I wonder why you are not getting good results, I was able to get decent results with default parameters.
I trained an ffhq model with a small dataset then created paired data with it to train for pixel2style2pixel. the problem I faced was results were not close to the input results were good but far from the input. Here I found just toon folder :) maybe it can help you somehow.
Thanks a lot!
@justmaulik again, one year later, did you get better results in the end or did you end up working with a completely another approach?
How to generate cartoon images like https://drive.google.com/drive/folders/1-6gZXiSDwT8hJxJcwqdGd2qErm-608rL , THX
@cvmlddev当然我有 20k 对(40k 图像)但现在找不到它如果我在某个地方找到它会分享它。 这是我刚找到的 1k 集。
Hi, can you share your dataset again, can't find it here, thanks!
I tried to train a toon model with source/target images as below Source Target Result Trained it till like 6000 iters as mentioned in the thread with the same settings above is just example trained the data with like 1000 images. It does not really give the expected output it actually changes the structure of the whole face. can you give some rough idea of what I could be doing wrong? and how do I preserve the input face and somehow make the eyes a bit large?