Closed kimyanna closed 1 year ago
You understood correctly :) When defining the data, you can set your source to the path of real images and the target to the path of the cartoon images (with corresponding names). Then inference is performed as you mentioned. You can pass new aligned face images and get their cartoon version. A small note, training should converge must faster. You can examine the outputted images during training to see when the test results look good. Regarding the data splitting, you can try splitting and leave say 10% of the pairs as test just to see that the more is able to generalize to unseen faces.
Perfect, thank you for great answer! Seems like there are two approaches of converting real faces into cartoons:
Seems like both of these approaches would achieve a similar outcome, are there pros/cons for each of these approaches?
Thank you!
You're correct that those are the main two approaches that I am familiar with. I think if you have paired data, the first approach is almost always preferred since you're explicitly training the model to achieve some mapping between real and cartoon pairs. If you don't have paired data, then you must use the second approach and train only with real face images. The down-side of that is that there is very weak supervision here and therefore the outputted images are not very "cartoon-like". If you train too much, you may start getting images that look more like real faces than cartoon faces. Since you have paired data, I think the first option is the way to go.
Perfect, thank you for great explanation! Closing the issue.
Hello, my goal is to convert an image of any face into its cartoon version. I have a set of ~6k image pairs (real face and its cartoon version) for training. To be able to convert a never seen face into its cartoon version I need to do the following:
scripts/train.py
(this will likely take several hundreds of thousands iterations). I'm savingbest_model.pt
and watching the loss go down while training.best_model.pt
starts producing satisfactory results I can usebest_model.pt
as a--checkpoint_path
parameter ofscripts/inference.py
and feed a previously unseen aligned image of a face intoscripts/inference.py
.Is my understanding of the process correct? Would this process produce a model able to turn an image of a face into its cartoon version? Also, do you have a recommendation on how to split train/test data in
dataset_psp
?Thank you!!