MFaceTech / InstantID

Apache License 2.0
149 stars 25 forks source link

Dataset format and GPU Memory consumption #4

Closed Oguzhanercan closed 3 months ago

Oguzhanercan commented 3 months ago

Hi, thanks for code relase, I have 2 questions. Is it possible to train the model with 24 GB Gpu memory? (With bfloat16). And should the dataset contains only face images, or is it okey to include whole body, or not relative objects etc? Since you trained on your custom closed dataset which contains only portraits, is it possible to mention about the results of training?

Thanks.

MFaceTech commented 3 months ago

1.When training images at a resolution of 1024x1024, a 24 GB GPU is insufficient for training, even with bfloat16. 2.The training dataset primarily consists of portraits, including both individual and group photos, and full-body shots. Labels are generated using BLIP, and training is conducted in the same manner as with standard diffusion models.

min-star commented 3 months ago

great work!I want to train instantid based on sd1.5,can you give some suggestions?

MFaceTech commented 3 months ago

great work!I want to train instantid based on sd1.5,can you give some suggestions?

The SD1.5 training code needs several modifications, mainly in the dataset preprocessing methods. Furthermore, new inference code must be developed since the original author provided inference only for SDXL. These changes demand considerable effort to guarantee satisfactory training outcomes.

min-star commented 3 months ago

great work!I want to train instantid based on sd1.5,can you give some suggestions?干得好!我想基于 sd1.5 训练 instantid,你能给一些建议吗?

The SD1.5 training code needs several modifications, mainly in the dataset preprocessing methods. Furthermore, new inference code must be developed since the original author provided inference only for SDXL. These changes demand considerable effort to guarantee satisfactory training outcomes.SD1.5训练代码需要进行一些修改,主要是在数据集预处理方法上。此外,由于原作者仅提供了 SDXL 的推理,因此必须开发新的推理代码。这些变化需要付出相当大的努力才能保证令人满意的培训结果。

I got it.Thanks a lot for your reply!

Oguzhanercan commented 3 months ago

1.When training images at a resolution of 1024x1024, a 24 GB GPU is insufficient for training, even with bfloat16. 2.The training dataset primarily consists of portraits, including both individual and group photos, and full-body shots. Labels are generated using BLIP, and training is conducted in the same manner as with standard diffusion models.

I am planning to train with 256x256 res, does it work at 24GB GPU. And what do you think about your results, is it better then original instantID

MFaceTech commented 3 months ago

Based on my experiments, training at an image resolution of 256x256 yields poor results. Additionally, the final results depends on the quality of your training dataset.

mnotgod96 commented 2 months ago

Just found it is possible to use multiple GPUs with small VRAM to train InstantID at the resolution of 1024 with the help of DeepSpeed. I trained InstantID using four 40GB A100 GPUs and the peak VRAM utilization per GPU was about 26 GB.

chenxinhua commented 2 months ago

Just found it is possible to use multiple GPUs with small VRAM to train InstantID at the resolution of 1024 with the help of DeepSpeed. I trained InstantID using four 40GB A100 GPUs and the peak VRAM utilization per GPU was about 26 GB.

very impressive

Oguzhanercan commented 2 months ago

Just found it is possible to use multiple GPUs with small VRAM to train InstantID at the resolution of 1024 with the help of DeepSpeed. I trained InstantID using four 40GB A100 GPUs and the peak VRAM utilization per GPU was about 26 GB.

Hi, I trained the model with same solution. But when I trained it with deepspeed and used more than 1 gpu, deadlock happens. Also the training results was not good. Can you please provide your script?