Open zyyyz opened 2 months ago
@zyyyz have you been able to successfully train model using code from this repo?
Is there any new progress? i trained pose controlnet with 50000 images,but when inference, even i set strength to 1,The image does not have any guided effect.Anyone can help me?
Hi, I’d like to commend you all on this fantastic project—it's truly impressive. I have a few questions and would appreciate any guidance:
Could you provide some details regarding the computational cost of training? Specifically, how much data was used, what type of GPUs were utilized, and how long the training process took?
When following the Accelerate Configuration Example, I encountered an issue when training on 2 H100 setup. The error message I received was:
RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16.
To resolve this, I had to modify the line
dit.to(accelerator.device)
(line 108 intrain_flux_deepspeed_controlnet.py
) todit.to(accelerator.device, dtype=weight_dtype)
, after which training proceeded normally. I'm not entirely sure what caused this discrepancy—any insight into the root of the issue?I'm training ControlNet on a small dataset of around 3,500 images. Throughout training, the loss seems to remain within the range of 0.5-0.6 after 10k steps. Is this behavior typical, or should I be concerned that something might be off?
I really appreciate any help or advice you can offer. Thanks again for the amazing work you're doing!