fine_tuned model fuse faces

drboog / ProFusion

Code for Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach

Apache License 2.0

463 stars 29 forks source link

fine_tuned model fuse faces #11

Open Laidawang opened 1 year ago

Laidawang commented 1 year ago

When I test multiple images, I first do a data augmentation with these two images at random, and then fine-tune my model on these data. But when I try to use different cfg combinations of these two pictures, I get pictures with almost the same kind of faces (combination of faces from both pictures) It seems that the cfg only changes their clothing and background. I noticed that you provided an identity_small model in your paper, it seems to change the face (more like A or more like B) according to different cfg. I notice that this models is fine-tuned on some celebraties images. Can you provide this data? Or simply tell me how to generate my fine-tuning dataset so that the model does not fuse faces.

drboog commented 1 year ago

Hi, when you fine-tune with multiple ground-truth images, be careful about the mapping during training, i.e. images should be first divided into different groups, X = {x_0, x_1, ..., x_n}, Y ={y_0, y_1, ... , y_m}, Z = {z_0, z_1, ..., z_k} ... each group represents different views for the same person. Then images from one group should only be mapped/correspond to images from the same group. This is not implemented in test.ipynb, but it is implemented in train.py, you can find some details there.

For the celebrity images, you can find it here, which contains some images that I manually collected (thus the data size is not very large).

Laidawang commented 1 year ago

Thank you very much for your help, as you said, I need to ensure that each one has a subfolder, such as A/01.jpg, A/02.jpg and B/01.jpg, B/02.jpg(A and B are two people), and then call train.py for fine-tuning?

drboog commented 1 year ago

You can either use train.py or test.ipynb, but both need to be revised.

If you want to use train.py for this experiment:

Because UNet is not fine-tuned in train.py, so you need to add it into optimization for better results.
Besides putting images in the correct subfolders, you also need to construct a metadata (see this). Without correctly constructed metadata, it does NOT work.

If you want to use test.ipynb:

You need to revise how the data is loaded. Currently, all images from a folder is pre-loaded and pre-processed into latents and embeddings, which are then randomly selected during training. What you can do is, for example, randomly select (image, a reference image from same subfolder) pairs and process them during training. Similar to this, you need to get latents and embeddings of different images, so you only need to revise corresponding lines in test.ipynb and make sure the latents and embeddings are fed into PromptNet and UNet correctly.

wangm-word commented 2 weeks ago

Hello, Thanks a little for the interruption，what is the structure of your training data?