ShivamShrirao / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
1.89k stars 505 forks source link

Training not taking effect #168

Open SpaceCowboy850 opened 1 year ago

SpaceCowboy850 commented 1 year ago

Describe the bug

I'm likely doing something wrong, but thought I'd post here in case someone else hits this.

I built up a new conda environment on a new computer. I tried to train the dog example, and it would not work. Training proceeded as normal, saved the output directory, but when I would go to infer, it would not remotely be the dog example. For example "marble bust of sks alvandog" was just a marble bust of an old dude.

I then uninstalled the Shavim diffuser, installed the conda huggingface diffuser, turned down my resolution (to avoid OOM), and it worked fine. mydog wasn't in marble, but it at least was a picture of the training dog. I put him on a skateboard, and the dog on the skateboard looked roughly like the training dog. I then uninstalled the huggingface diffuser, reinstalled Shavim's diffuser, and again, my dog on a skateboard became human legs pushing a skateboard, not a dog in sight.

Used the same training prompt in both cases:

accelerate launch .\diffusers\examples\dreambooth\train_dreambooth.py --train_text_encoder --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --instance_data_dir="mydog" --output_dir="output" --instance_prompt="a photo of sks alvandog" --resolution=256 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=3e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=800 --with_prior_preservation --prior_loss_weight=1.0 --class_prompt="a photo of dog" --class_data_dir="regularization/dog" --num_class_images=300

inference code used the ddim scheduler (same as the training used) Learning rate seems to not matter, as I've tried 1e-6 and 5e-6 when broken.

I do get a warning in both cases about

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'

The huggingface diffuser was on 0.11.1 Shavim's is on 0.9.0

Has anyone seen this behavior?

Reproduction

No response

Logs

No response

System Info

huggingface diffuser 0.11.1 Shavim Diffuser 0.9.0

Windows 11 Python 3.9 Cuda 11.6 Pytorch 1.13

Ehsan1997 commented 1 year ago

I think I am facing a similar issue. One thing I noticed was that if I use xformers library it ruins the results.

More Details: GPU: A10G (24 GB) - AWS Hosted Machine Description: I first trained the model without prior preservation, everything was working fine. Then I decided to use prior preservation and ended up getting OOM error. I then installed Xformers library (tried both - using prebuilt library and building on my system), the results were completely random entities (dogs or person - nothing to do with the instance images I provided). Then I trained again without prior preservation, once again the results were completely random.

Uninstalling Xformers returns the model to normal, but I still wasn't able to do prior preservation.

Edit: The Colab notebook is working perfectly fine, not sure why there's problem on the local instance.

AhmedAbdellaa commented 1 year ago

Edit: The Colab notebook is working perfectly fine, not sure why there's problem on the local instance. same problem here I used rtx 3060 with the same library as colab and after completing train got totally random images has no relation to my images if anyone finds a solution can please share

I think I am facing a similar issue. One thing I noticed was that if I use xformers library it ruins the results.

More Details: GPU: A10G (24 GB) - AWS Hosted Machine Description: I first trained the model without prior preservation, everything was working fine. Then I decided to use prior preservation and ended up getting OOM error. I then installed Xformers library (tried both - using prebuilt library and building on my system), the results were completely random entities (dogs or person - nothing to do with the instance images I provided). Then I trained again without prior preservation, once again the results were completely random.

Uninstalling Xformers returns the model to normal, but I still wasn't able to do prior preservation.

Edit: The Colab notebook is working perfectly fine, not sure why there's problem on the local instance.

Edit: The Colab notebook is working perfectly fine, not sure why there's problem on the local instance. same problem here I used rtx 3060 with the same library as colab and after completing train got totally random images has no relation to my images if anyone find a solution can please share

dizhenx commented 1 year ago

conda huggingface diffuser,

How to install conda huggingface diffuser?