bghira / SimpleTuner

A general fine-tuning kit geared toward diffusion models.
GNU Affero General Public License v3.0
1.57k stars 140 forks source link

Getting blank images #211

Closed suhaneshivam closed 9 months ago

suhaneshivam commented 11 months ago

Hi, I have trained the model for 5 epochs and when I am doing inference using the saved checkpoints, All I am getting is blank images: I am using this python code:

from accelerate import Accelerator
from diffusers import (
    DiffusionPipeline,
    UNet2DConditionModel,
    DDPMScheduler,
    DDIMScheduler,
    AutoencoderKL
)
from transformers import CLIPTextModel
from helpers.prompts import prompts
from compel import Compel

import torch, os, logging

logger = logging.getLogger("SimpleTuner-inference")
logger.setLevel(logging.INFO)
torch.cuda.empty_cache()
torch_seed = 4202420420
negative = "deep fried watermark cropped out-of-frame low quality low res oorly drawn bad anatomy wrong anatomy extra limb missing limb floating limbs (mutated hands and fingers)1.4 disconnected limbs mutation mutated ugly disgusting blurry amputation synthetic rendering"
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
checkpoint = "checkpoint-3150"

model_path = "/home/paperspace/SimpleTuner/models"

unet = UNet2DConditionModel.from_pretrained(
                        f"{model_path}/{checkpoint}/unet"
                    )

pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet)
compel = Compel(tokenizer=[pipeline.tokenizer, pipeline.tokenizer_2] , text_encoder=[pipeline.text_encoder, pipeline.text_encoder_2], requires_pooled=[False, True])

negative_embed, negative_pooled = compel(negative)
negative_embed = negative_embed.to("cuda")
negative_pooled  = negative_pooled.to("cuda")
pipeline.scheduler = DDIMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler",
    rescale_betas_zero_snr=True,
    timestep_spacing="trailing",
)

pipeline.to("cuda")

prompt = "Image of villagers joyfully celebrating Diwali++ with traditional lamps and colorful decorations in a rural setting"

conditioning, pooled  = compel(prompt)
conditioning = conditioning.to("cuda")
pooled = pooled.to("cuda")
generator = torch.Generator(device="cuda").manual_seed(torch_seed)

output = pipeline(
    generator=generator,
    negative_prompt_embeds=negative_embed,
     negative_pooled_prompt_embeds = negative_pooled,
    prompt_embeds=conditioning,
    pooled_prompt_embeds=pooled,
    guidance_scale=7.5,
    guidance_rescale=0.0,
    width=1024,
    height=1024,
    num_inference_steps=50,
    ).images[0]

output.save(
    f"{checkpoint}.png"
    )
del output

I am also gettting this warning:

RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8") 

in the end: The env file which I used for training :

#!/bin/bash
# Configure these values.

# Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
export RESUME_CHECKPOINT="latest"

# How often to checkpoint. Depending on your learning rate, you may wish to change this.
# For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
export CHECKPOINTING_STEPS=150
# This is how many checkpoints we will keep. Two is safe, but three is safer.
export CHECKPOINTING_LIMIT=2

# This is decided as a relatively conservative 'constant' learning rate.
# Adjust higher or lower depending on how burnt your model becomes.
export LEARNING_RATE=8e-7 #@param {type:"number"}

# Using a Huggingface Hub model:
export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
# Using a local path to a huggingface hub model or saved checkpoint:
#export MODEL_NAME="/datasets/models/pipeline"

# Make DEBUG_EXTRA_ARGS empty to disable wandb.
export DEBUG_EXTRA_ARGS="--report_to=wandb"
export TRACKER_PROJECT_NAME="sdxl-training"
export TRACKER_RUN_NAME="simpletuner-sdxl"

# Use this to append an instance prompt to each caption, used for adding trigger words.
# This has not been tested in SDXL.
#export INSTANCE_PROMPT="lotr style "
# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
export VALIDATION_PROMPT="photograph of Indian people celebrating Diwali"
export VALIDATION_GUIDANCE=7.5
# You'll want to set this to 0.7 if you are training a terminal SNR model.
export VALIDATION_GUIDANCE_RESCALE=0.0
# How frequently we will save and run a pipeline for validations.
export VALIDATION_STEPS=100
# Max number of steps OR epochs can be used. But we default to Epochs.
export MAX_NUM_STEPS=0
# Will likely overtrain, but that's fine.
export NUM_EPOCHS=25

# Location of training data.
export BASE_DIR="/home/paperspace/SimpleTuner"
export INSTANCE_DIR="${BASE_DIR}/processed_dataset"
export OUTPUT_DIR="${BASE_DIR}/models"
# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
export RESOLUTION=1024
# Minimum resolution and validation resolution are measured in pixels, as it represents the image edge length.
export MINIMUM_RESOLUTION=$RESOLUTION
export VALIDATION_RESOLUTION=$RESOLUTION
# If you want to have the training data resized by pixel area (Megapixels) rather than edge length,
#  set this value to "area" instead of "pixel", and uncomment the next RESOLUTION declaration.
export RESOLUTION_TYPE="pixel"
#export RESOLUTION=1.0          # 1.0 Megapixel training sizes

# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
export TRAIN_BATCH_SIZE=10
# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
export GRADIENT_ACCUMULATION_STEPS=4

# Some data that we generate will be cached here. Training state is baked into the checkpoints themselves.
export STATE_PATH="${BASE_DIR}/training_state.json"
# Store whether we've seen an image or not, to prevent repeats.
export SEEN_STATE_PATH="${BASE_DIR}/training_images_seen.json"

# Use any standard scheduler type. constant, polynomial, constant_with_warmup
export LR_SCHEDULE="constant"
# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))

# Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
export CAPTION_DROPOUT_PROBABILITY=0.1

# TF32 is great on Ampere or Ada, not sure about earlier generations.
export TRAINER_EXTRA_ARGS="--allow_tf32 --use_8bit_adam --use_ema"

## For offset noise training:
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"

## For noise input pertubation - adds extra noise, randomly. This is separate from offset noise, but can help stabilize it and reduce overfitting.
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --input_pertubation=0.01"

## For terminal SNR training:
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
## You may benefit from directing training toward a specific weighted subset of timesteps.
# In this example, we train the final 25% of the timestep schedule with a 3x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=later --timestep_bias_portion=0.25 --timestep_bias_multiplier=3"
# In this example, we train the earliest 25% of the timestep schedule with a 5x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=earlier --timestep_bias_portion=0.25 --timestep_bias_multiplier=5"
# Here, we designate that specifically, timesteps 200 to 500 should be prioritised.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=range --timestep_bias_begin=200 --timestep_bias_end=500 --timestep_bias_multiplier=3"

## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"

# For Wasabi S3 filesystem backend (experimental)
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"

# Reproducible training. Set to -1 to disable.
export TRAINING_SEED=420420420

# Below here, these are pretty sketchy to change. --use_original_images can be removed to enable image cropping. Not tested for SDXL.
# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
export MIXED_PRECISION="bf16"                # Might not be supported on all GPUs. fp32 will be needed for others.

# This has to be changed if you're training with multiple GPUs.
export TRAINING_NUM_PROCESSES=1
export TRAINING_NUM_MACHINES=1
export ACCELERATE_EXTRA_ARGS=""                          # --multi_gpu or other similar flags for huggingface accelerate
export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --enable_xformers_memory_efficient_attention --use_original_images=true --set_grads_to_none"
export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --gradient_checkpointing --gradient_accumulation_steps=${GRADIENT_ACCUMULATION_STEPS}"

# With Pytorch 2.1, you might have pretty good luck here.
# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
export TRAINING_DYNAMO_BACKEND='no'          # or 'inductor' if you want to brave PyTorch 2 compile issues

My requirements.txt

accelerate==0.23.0
aiohttp==3.8.6
aiosignal==1.3.1
appdirs==1.4.4
async-timeout==4.0.3
attrs==23.1.0
bitsandbytes-cuda117==0.26.0.post2
boto3==1.28.64
botocore==1.31.64
build==0.10.0
CacheControl==0.13.1
certifi==2022.12.7
cffi==1.16.0
charset-normalizer==2.1.1
cleo==2.0.1
click==8.1.7
clip-interrogator==0.6.0
cmake==3.25.0
colorama==0.4.6
compel==2.0.2
crashtest==0.4.1
cryptography==41.0.4
datasets==2.14.5
diffusers @ git+https://github.com/huggingface/diffusers@93df5bb67016a176cab4b58405e4daf5bd1828d9
dill==0.3.7
distlib==0.3.7
docker-pycreds==0.4.0
dulwich==0.21.6
filelock==3.12.4
frozenlist==1.4.0
fsspec==2023.6.0
ftfy==6.1.1
gitdb==4.0.10
GitPython==3.1.38
huggingface-hub==0.17.3
idna==3.4
importlib-metadata==6.8.0
installer==0.7.0
jaraco.classes==3.3.0
jeepney==0.8.0
Jinja2==3.1.2
jmespath==1.0.1
jsonschema==4.17.3
keyring==24.2.0
lit==15.0.7
MarkupSafe==2.1.2
more-itertools==10.1.0
mpmath==1.3.0
msgpack==1.0.7
multidict==6.0.4
multiprocess==0.70.15
networkx==3.0
numpy==1.24.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu11==8.5.0.96
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu11==10.9.0.58
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu11==10.2.10.91
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu11==11.7.4.91
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu11==2.14.3
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.2.140
nvidia-nvtx-cu11==11.7.91
nvidia-nvtx-cu12==12.1.105
open-clip-torch==2.22.0
packaging==23.2
pandas==2.1.1
pathtools==0.1.2
pexpect==4.8.0
Pillow==9.3.0
pkginfo==1.9.6
platformdirs==3.11.0
poetry==1.6.1
poetry-core==1.7.0
poetry-plugin-export==1.5.0
protobuf==3.20.3
psutil==5.9.6
ptyprocess==0.7.0
pyarrow==13.0.0
pycparser==2.21
pyparsing==3.1.1
pyproject_hooks==1.0.0
pyrsistent==0.19.3
python-dateutil==2.8.2
pytz==2023.3.post1
PyYAML==6.0.1
rapidfuzz==2.15.2
regex==2023.10.3
requests==2.28.1
requests-toolbelt==1.0.0
s3transfer==0.7.0
safetensors==0.4.0
SecretStorage==3.3.3
sentencepiece==0.1.99
sentry-sdk==1.32.0
setproctitle==1.3.3
shellingham==1.5.3
six==1.16.0
smmap==5.0.1
sympy==1.12
timm==0.9.7
tokenizers==0.14.1
tomli==2.0.1
tomlkit==0.12.1
torch==2.0.1
torchaudio==2.0.2+cu117
torchvision==0.15.2
tqdm==4.66.1
transformers==4.34.0
triton==2.0.0
trove-classifiers==2023.9.19
typing_extensions==4.4.0
tzdata==2023.3
urllib3==1.26.13
virtualenv==20.24.5
wandb==0.15.12
wcwidth==0.2.8
xformers==0.0.21
xxhash==3.4.1
yarl==1.9.2
zipp==3.17.0

nvidia-smi output

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:05.0 Off |                    0 |
| N/A   27C    P0    54W / 400W |    121MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1319      G   /usr/lib/xorg/Xorg                 74MiB |
|    0   N/A  N/A      2303      G   /usr/bin/gnome-shell               39MiB |
|    0   N/A  N/A      3236      G   ...bexec/gnome-initial-setup        6MiB |
+-----------------------------------------------------------------------------+
bghira commented 11 months ago

hi! do you use the release branch, or main? things are in flux on main branch right now, but i still wouldn't have expected this..

black images happen when loss goes to infinity during training. i dont think you changed the vae that is in use, right?

bghira commented 11 months ago

oh, you must set rescale betas to False for vanilla SDXL

suhaneshivam commented 11 months ago

I am not sure where exactly I need to set this parameter to False. Do I need to set this parameter to False in env file and then need to retrain or I have pass it during inference. I also tried changing the vae but still getting same result.

bghira commented 11 months ago
pipeline.scheduler = DDIMScheduler.from_pretrained(
    model_id,
    subfolder="scheduler",
    rescale_betas_zero_snr=True,
    timestep_spacing="trailing",
)

here in this scheduler invocation you are setting rescale_betas_zero_snr=True

the env file seems to have it disabled:

## For terminal SNR training:
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"

i would recommend keeping it disabled for SDXL unless you want to go through about 50,000 steps of training on a few million images to overhaul the whole noise schedule :D

i think you just have to set that value to False during inference time, i don't think retraining is necessary.

suhaneshivam commented 11 months ago

I set the parameter rescale_betas_zero_snr to False inside scheduler but still getting the black images with all nan values. Although I am getting expected images when I trained the model again with --mixed_precision=no, keeping other settings unchanged. I think, this has to do with the warning

RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8") 

which I get every time I run inference.

bghira commented 11 months ago

what were your loss values during training?

bghira commented 11 months ago

you could also greatly simplify the example and not use Compel for prompt handling, but instead using the prompt and negative_prompt inputs directly.

just initialise the pipeline = ...from_pretrained('/path/to/model') and allow it to fully pick up the model config and its default scheduler, and everything. that will likely use Euler and be more reliable.

suhaneshivam commented 11 months ago

I initially set the train epochs to be 25 but checked the result after 5 epochs. When I realised that It was generating blank images then I terminated the training job. Then I re-run the training to save the pipeline for another single epoch and ran inference using pipeline = ...from_pretrained('/path/to/model') without compel. Still getting the same result.

suhaneshivam commented 11 months ago

I checked the loss and it turned out that It was NaN through out the training.

bghira commented 11 months ago

:cry: that is never fun. do you have any debug logs for that session?

suhaneshivam commented 11 months ago

Sure! Not debug logs though. https://drive.google.com/file/d/1eC4qhH2V2lfFm-y3ZLVkbMNvm6k32NTX/view?usp=drive_link

bghira commented 11 months ago

those state i need access. if you can recreate the issue easily, please do so with

SIMPLETUNER_LOG_LEVEL=DEBUG

in your env file.

bash train_sdxl.sh > train.log 2>&1

and then provide train.log here, with whatever info redacted as needed.

suhaneshivam commented 11 months ago

I will provide you with logs soon.

suhaneshivam commented 11 months ago

Here are the debug logs. https://drive.google.com/file/d/1dzynYWKaA1J5wGzavKz16yrYA5xQf-gV/view?usp=drive_link I was able to get the expected results when I trained with mixed_precision=no option.

suhaneshivam commented 11 months ago

Also, It did not save vae cache in cache_vae directory so I had to tweak the script to generate it at runtime.

bghira commented 9 months ago

please try reproducing this on v0.8.0