bghira / SimpleTuner

A general fine-tuning kit geared toward diffusion models.
GNU Affero General Public License v3.0
1.65k stars 150 forks source link

Unable to detect images in the dataset #545

Closed OliviaOliveiira closed 3 months ago

OliviaOliveiira commented 3 months ago

Hey! It's me again. Finally got a working lora, yet, after changing the dataset (and most-likely git pulling) it can no longer find the image files, even though evertything is as it was before and the paths are correct.

(.venv) miya123@Miya123:~/Рабочий стол/Huita/SimpleTuner$ bash train_sdxl.sh 
DEBUG_EXTRA_ARGS not set, defaulting to empty.
Disabling Xformers for Stable Diffusion 3 (https://github.com/huggingface/diffusers/issues/8535)
2024-06-28 22:14:04,175 [WARNING] (ArgsParser) Stable Diffusion 3 requires a pixel alignment interval of 64px. Updating value.
2024-06-28 22:14:04,175 [WARNING] (ArgsParser) Disabling Compel long-prompt weighting for SD3 inference, as it does not support Stable Diffusion 3.
2024-06-28 22:14:04,175 [WARNING] (ArgsParser) Stable Diffusion 3 requires --max_grad_norm=0.01 to prevent model collapse. Overriding value. Set this value manually to disable this warning.
2024-06-28 22:14:04,193 [WARNING] (__main__) If using an Ada or Ampere NVIDIA device, --allow_tf32 could add a bit more performance.
2024-06-28 22:14:04,193 [INFO] (__main__) Load tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
2024-06-28 22:14:04,427 [INFO] (__main__) Load OpenAI CLIP-L/14 text encoder..
2024-06-28 22:14:04,431 [INFO] (__main__) Loading T5-XXL v1.1 text encoder from /media/miya123/UUI1/Users/USER01/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3-medium-diffusers/snapshots/ea42f8cef0f178587cf766dc8129abd379c90671/text_encoder..
2024-06-28 22:14:04,905 [INFO] (__main__) Loading LAION OpenCLIP-G/14 text encoder..
2024-06-28 22:14:07,428 [INFO] (__main__) Loading T5-XXL v1.1 text encoder..
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:14<00:00,  7.05s/it]
2024-06-28 22:14:22,142 [INFO] (__main__) Load VAE..
2024-06-28 22:14:22,623 [INFO] (__main__) Moving models to GPU. Almost there.
2024-06-28 22:14:28,887 [INFO] (__main__) Loading Stable Diffusion 3 diffusion transformer..
2024-06-28 22:14:36,635 [INFO] (__main__) Using LoRA training mode.
2024-06-28 22:14:36,996 [INFO] (__main__) Moving the diffusion transformer to GPU in torch.bfloat16 precision.
2024-06-28 22:14:38,377 [INFO] (__main__) Initialising VAE in bf16 precision, you may specify a different value if preferred: bf16, fp32, default
2024-06-28 22:14:38,426 [INFO] (__main__) Loaded VAE into VRAM.
2024-06-28 22:14:38,426 [INFO] (DataBackendFactory) Loading data backend config from /home/miya123/Рабочий стол/Huita/SimpleTuner/Base_dir/multidatabackend.json
2024-06-28 22:14:38,427 [INFO] (DataBackendFactory) Configuring text embed backend: alt-embed-cache
2024-06-28 22:14:38,428 [INFO] (TextEmbeddingCache) (Rank: 0) (id=alt-embed-cache) Listing all text embed cache entries
2024-06-28 22:14:38,428 [INFO] (DataBackendFactory) Pre-computing null embedding
2024-06-28 22:14:43,429 [INFO] (DataBackendFactory) Completed loading text embed services.
2024-06-28 22:14:43,429 [INFO] (DataBackendFactory) Configuring data backend: Altushka
2024-06-28 22:14:43,429 [INFO] (DataBackendFactory) Configured backend: {'id': 'Altushka', 'config': {'vae_cache_clear_each_epoch': False, 'probability': 1.0, 'repeats': 0, 'crop': 'false', 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 1024, 'resolution_type': 'pixel', 'caption_strategy': 'instanceprompt', 'instance_data_root': '/home/miya123/Рабочий стол/Huita/SimpleTuner/Base_dir/Huita', 'maximum_image_size': None, 'target_downsample_size': None}, 'dataset_type': 'image'}
2024-06-28 22:14:43,429 [INFO] (DataBackendFactory) (id=Altushka) Loading bucket manager.
2024-06-28 22:14:43,433 [INFO] (JsonMetadataBackend) Checking for cache file: /home/miya123/Рабочий стол/Huita/SimpleTuner/Base_dir/Huita/aspect_ratio_bucket_indices.json
2024-06-28 22:14:43,433 [WARNING] (JsonMetadataBackend) No cache file found, creating new one.
2024-06-28 22:14:43,433 [INFO] (DataBackendFactory) Configured backend: {'id': 'Altushka', 'config': {'vae_cache_clear_each_epoch': False, 'probability': 1.0, 'repeats': 0, 'crop': 'false', 'crop_aspect': 'square', 'crop_style': 'random', 'disable_validation': False, 'resolution': 1024, 'resolution_type': 'pixel', 'caption_strategy': 'instanceprompt', 'instance_data_root': '/home/miya123/Рабочий стол/Huita/SimpleTuner/Base_dir/Huita', 'maximum_image_size': None, 'target_downsample_size': None}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x7e7131105580>, 'instance_data_root': '/home/miya123/Рабочий стол/Huita/SimpleTuner/Base_dir/Huita', 'metadata_backend': <helpers.metadata.backends.json.JsonMetadataBackend object at 0x7e7131106750>}
(Rank: 0)  | Bucket     | Image Count 
------------------------------
2024-06-28 22:14:43,435 [ERROR] (__main__) No images were discovered by the bucket manager in the dataset: Altushka., traceback: Traceback (most recent call last):
  File "/home/miya123/Рабочий стол/Huita/SimpleTuner/train_sdxl.py", line 699, in main
    configure_multi_databackend(
  File "/home/miya123/Рабочий стол/Huita/SimpleTuner/helpers/data_backend/factory.py", line 595, in configure_multi_databackend
    raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: Altushka.
pvp-by commented 3 months ago

confirm. got the same thing, although I didn't realize at what point it broke .... For several runs it calmly said that there were images. and then the problem was in a completely different place.

(py311) ubuntu@Lightest:~/SimpleTuner$ ./train_sdxl.sh
DEBUG_EXTRA_ARGS not set, defaulting to empty.
Disabling Xformers for Stable Diffusion 3 (https://github.com/huggingface/diffusers/issues/8535)
2024-06-29 13:46:07,250 [WARNING] (ArgsParser) Stable Diffusion 3 requires a pixel alignment interval of 64px. Updating value.
2024-06-29 13:46:07,250 [WARNING] (ArgsParser) Disabling Compel long-prompt weighting for SD3 inference, as it does not support Stable Diffusion 3.
2024-06-29 13:46:07,251 [WARNING] (ArgsParser) Stable Diffusion 3 requires --max_grad_norm=0.01 to prevent model collapse. Overriding value. Set this value manually to disable this warning.
2024-06-29 13:46:07,285 [INFO] (__main__) Enabling tf32 precision boost for NVIDIA devices due to --allow_tf32.
2024-06-29 13:46:07,285 [INFO] (__main__) Load tokenizers
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
2024-06-29 13:46:07,499 [INFO] (__main__) Load OpenAI CLIP-L/14 text encoder..
2024-06-29 13:46:07,511 [INFO] (__main__) Loading T5-XXL v1.1 text encoder from /home/ubuntu/SimpleTuner/datasets/stable-diffusion-3-medium-diffusers/text_encoder..
2024-06-29 13:46:07,781 [INFO] (__main__) Loading LAION OpenCLIP-G/14 text encoder..
2024-06-29 13:46:09,208 [INFO] (__main__) Loading T5-XXL v1.1 text encoder..
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.90s/it]
2024-06-29 13:46:17,500 [INFO] (__main__) Load VAE..
2024-06-29 13:46:17,706 [INFO] (__main__) Moving models to GPU. Almost there.
2024-06-29 13:46:21,520 [INFO] (__main__) Loading Stable Diffusion 3 diffusion transformer..
2024-06-29 13:46:25,248 [INFO] (__main__) Using LoRA training mode.
2024-06-29 13:46:25,355 [INFO] (__main__) Moving the diffusion transformer to GPU in torch.bfloat16 precision.
2024-06-29 13:46:27,473 [INFO] (__main__) Initialising VAE in bf16 precision, you may specify a different value if preferred: bf16, fp32, default
2024-06-29 13:46:27,576 [INFO] (__main__) Loaded VAE into VRAM.
2024-06-29 13:46:27,577 [INFO] (DataBackendFactory) Loading data backend config from /home/ubuntu/SimpleTuner/datasets/multidatabackend.json
2024-06-29 13:46:27,577 [INFO] (DataBackendFactory) Configuring text embed backend: text-embeds
2024-06-29 13:46:27,578 [INFO] (TextEmbeddingCache) (Rank: 0) (id=text-embeds) Listing all text embed cache entries
2024-06-29 13:46:27,580 [INFO] (DataBackendFactory) Pre-computing null embedding
                                                                                                                             2024-06-29 13:46:33,039 [INFO] (DataBackendFactory) Completed loading text embed services.
2024-06-29 13:46:33,039 [INFO] (DataBackendFactory) Configuring data backend: sd3test
2024-06-29 13:46:33,040 [INFO] (DataBackendFactory) Configured backend: {'id': 'sd3test', 'config': {'vae_cache_clear_each_epoch': False, 'crop': False, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 1024, 'resolution_type': 'pixel', 'caption_strategy': 'textfile', 'instance_data_root': '/home/ubuntu/SimpleTuner/datasets/data/train', 'maximum_image_size': 1024, 'target_downsample_size': 1024}, 'dataset_type': 'image'}
2024-06-29 13:46:33,040 [INFO] (DataBackendFactory) (id=sd3test) Loading bucket manager.
2024-06-29 13:46:33,042 [INFO] (JsonMetadataBackend) Checking for cache file: /home/ubuntu/SimpleTuner/datasets/data/train/aspect_ratio_bucket_indices.json
2024-06-29 13:46:33,042 [WARNING] (JsonMetadataBackend) No cache file found, creating new one.
2024-06-29 13:46:33,042 [INFO] (DataBackendFactory) (id=sd3test) Refreshing aspect buckets on main process.
2024-06-29 13:46:33,042 [INFO] (BaseMetadataBackend) Discovering new files...
2024-06-29 13:46:33,043 [INFO] (BaseMetadataBackend) Compressed 0 existing files from 0.
2024-06-29 13:46:33,188 [INFO] (BaseMetadataBackend) Image processing statistics: {'total_processed': 10, 'skipped': {'already_exists': 0, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-06-29 13:46:33,189 [INFO] (BaseMetadataBackend) Enforcing minimum image size of 1024. This could take a while for very-large datasets.
2024-06-29 13:46:33,189 [INFO] (BaseMetadataBackend) Completed aspect bucket update.
2024-06-29 13:46:33,190 [INFO] (DataBackendFactory) Configured backend: {'id': 'sd3test', 'config': {'vae_cache_clear_each_epoch': False, 'crop': False, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 1024, 'resolution_type': 'pixel', 'caption_strategy': 'textfile', 'instance_data_root': '/home/ubuntu/SimpleTuner/datasets/data/train', 'maximum_image_size': 1024, 'target_downsample_size': 1024}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x7fa6cbef08d0>, 'instance_data_root': '/home/ubuntu/SimpleTuner/datasets/data/train', 'metadata_backend': <helpers.metadata.backends.json.JsonMetadataBackend object at 0x7fa6cc20f5d0>}
(Rank: 0)  | Bucket     | Image Count
------------------------------
2024-06-29 13:46:33,191 [ERROR] (__main__) No images were discovered by the bucket manager in the dataset: sd3test., traceback: Traceback (most recent call last):
  File "/home/ubuntu/SimpleTuner/train_sdxl.py", line 699, in main
    configure_multi_databackend(
  File "/home/ubuntu/SimpleTuner/helpers/data_backend/factory.py", line 597, in configure_multi_databackend
    raise Exception(
Exception: No images were discovered by the bucket manager in the dataset: sd3test.

create file : d41d8cd98f00b204e9800998ecf8427e-sd3.pt aspect_ratio_bucket_indices.json aspect_ratio_bucket_metadata.json

If necessary, I can add their contents and the folder structure with images from the configs.

bghira commented 3 months ago

can you share the first bit of the output for:

jq '.' < aspect_ratio_bucket_indices.json

eg. the config block and the first bucket, just a few entries. i'm curious what their path there looks like.

and if you could show the file structure on disk

and the dataloader config

OliviaOliveiira commented 3 months ago

Here's the file structure

miya123@Miya123:~/Рабочий стол/Huita/SimpleTuner$ tree -L 3 -d -a
.
├── Base_dir
│   ├── Cache_Huita_Kakayata
│   ├── Huita
│   ├── models
│   ├── vaecache
│   └── VAECache
├── documentation
│   ├── data_presets
│   └── quickstart
├── .git
│   ├── branches
│   ├── hooks
│   ├── info
│   ├── logs
│   │   └── refs
│   ├── objects
│   │   ├── 17
│   │   ├── 18
│   │   ├── 3e
│   │   ├── e7
│   │   ├── ed
│   │   ├── info
│   │   └── pack
│   └── refs
│       ├── heads
│       ├── remotes
│       └── tags
├── .github
│   └── workflows
├── helpers
│   ├── caching
│   │   └── __pycache__
│   ├── data_backend
│   │   └── __pycache__
│   ├── image_manipulation
│   │   └── __pycache__
│   ├── legacy
│   │   └── __pycache__
│   ├── metadata
│   │   └── backends
│   ├── multiaspect
│   │   └── __pycache__
│   ├── pixart
│   ├── publishing
│   ├── __pycache__
│   ├── sd3
│   │   └── __pycache__
│   ├── sdxl
│   │   └── __pycache__
│   ├── training
│   │   ├── adam_bfloat16
│   │   └── __pycache__
│   └── webhooks
├── install
│   ├── apple
│   └── rocm
├── tests
│   └── helpers
├── toolkit
│   ├── captioning
│   │   └── classes
│   ├── datasets
│   │   └── controlnet
│   └── inference
│       └── sigma
├── .venv
│   ├── bin
│   ├── include
│   │   └── python3.12
│   ├── lib
│   │   └── python3.12
│   ├── lib64 -> lib
│   ├── share
│   │   └── man
│   └── src
│       └── diffusers
├── venv
│   ├── bin
│   ├── include
│   │   └── python3.12
│   ├── lib
│   │   └── python3.12
│   └── lib64 -> lib
└── wandb
    ├── latest-run -> run-20240628_201546-143a1bb161fe784897c4385515c6ed26
    ├── offline-run-20240626_234421-0a8074681594c4c1b45493c4b3e103f4
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── offline-run-20240626_234937-a9ee619e4e82cc542427555267a5e18c
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── offline-run-20240626_235528-39f5071a388a988f4f4088378d4cf092
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── offline-run-20240626_235646-7fb4227537cf7f51bcf5d1c31cb4f2af
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── offline-run-20240627_000515-a9ee619e4e82cc542427555267a5e18c
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── offline-run-20240628_165712-0f98f8b2656816ca07f157a4a37fb382
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_170219-0f98f8b2656816ca07f157a4a37fb382
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_172221-c3900026eff382a602f7e9c1b4b94d3d
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_172810-a142850e7ec58389e1552e159f075291
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_173355-a212801780ee848a431bb7127d8349c2
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_180517-0cd4e860e0d8970a3254bc263137034a
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_195753-0be2db492fdbfc4fc357ea3c1f05ae8e
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_200517-790a66e402e30b0c3d9d988436e2bbe7
    │   ├── files
    │   ├── logs
    │   └── tmp
    ├── run-20240628_201057-9ed8b6068326d3cb0315f5613cc9ede5
    │   ├── files
    │   ├── logs
    │   └── tmp
    └── run-20240628_201546-143a1bb161fe784897c4385515c6ed26
        ├── files
        ├── logs
        └── tmp

aspect_ratio_bucket_indices.json not sure if it creates one for me, can't find it. multidatabackend.json

here's my sdxl_env.sh

# Configure these values.

# 'lora' or 'full'
# lora - train a small network for a character or style, or both. quite versatile.
# full - requires lots of vram, trains very slowly, needs a lot of data and concepts.
export MODEL_TYPE='lora'

# Set this to 'true' if you are training a Stable Diffusion 3 checkpoint.
# Use MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
export STABLE_DIFFUSION_3=true
# Similarly, this is to train PixArt Sigma (1K or 2K) models.
# Use MODEL_NAME="PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
export PIXART_SIGMA=false

# ControlNet model training is only supported when MODEL_TYPE='full'
# See this document for more information: https://github.com/bghira/SimpleTuner/blob/main/documentation/CONTROLNET.md
# DeepFloyd, PixArt, and SD3 do not currently support ControlNet model training.
export CONTROLNET=false

# DoRA enhances the training style of LoRA, but it will run more slowly at the same rank.
# See: https://arxiv.org/abs/2402.09353
# See: https://github.com/huggingface/peft/pull/1474
export USE_DORA=false

# BitFit freeze strategy for the u-net causes everything but the biases to be frozen.
# This may help retain the full model's underlying capabilities. LoRA is currently not tested/known to work.
#if [[ "$MODEL_TYPE" == "full" ]]; then
#    # When training a full model, we will rely on BitFit to keep the u-net intact.
#    export USE_BITFIT=true
#elif [[ "$MODEL_TYPE" == "lora" ]]; then
#    # LoRA can not use BitFit.
#    export USE_BITFIT=false
#elif [[ "$MODEL_TYPE" == "deepfloyd-full" ]]; then
#    export USE_BITFIT=true
#fi

# Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
export RESUME_CHECKPOINT="latest"

# How often to checkpoint. Depending on your learning rate, you may wish to change this.
# For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
export CHECKPOINTING_STEPS=500
# This is how many checkpoints we will keep. Two is safe, but three is safer.
export CHECKPOINTING_LIMIT=6

# This is decided as a relatively conservative 'constant' learning rate.
# Adjust higher or lower depending on how burnt your model becomes.
export LEARNING_RATE=0.0001 #@param {type:"number"}

# Using a Huggingface Hub model:
export MODEL_NAME="/media/miya123/UUI/Users/USER01/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3-medium-diffusers/snapshots/ea42f8cef0f178587cf766dc8129abd379c90671"
# Using a local path to a huggingface hub model or saved checkpoint:
#export MODEL_NAME="/datasets/models/pipeline"

# Make DEBUG_EXTRA_ARGS empty to disable wandb.
export DEBUG_EXTRA_ARGS=""
export TRACKER_PROJECT_NAME="sd3-training"
export TRACKER_RUN_NAME="sd3-lora-ALTUSHKA"

# Max number of steps OR epochs can be used. Not both.
export MAX_NUM_STEPS=3500
# Will likely overtrain, but that's fine.
export NUM_EPOCHS=0

# A convenient prefix for all of your training paths.
export BASE_DIR="/home/miya123/Рабочий стол/Huita/SimpleTuner/Base_dir"
export DATALOADER_CONFIG="${BASE_DIR}/multidatabackend.json"
export OUTPUT_DIR="${BASE_DIR}/models"
# Set this to "true" to push your model to Hugging Face Hub.
export PUSH_TO_HUB="false"
# If PUSH_TO_HUB and PUSH_CHECKPOINTS are both enabled, every saved checkpoint will be pushed to Hugging Face Hub.
export PUSH_CHECKPOINTS="false"
# This will be the model name for your final hub upload, eg. "yourusername/yourmodelname"
# It defaults to the wandb project name, but you can override this here.
export HUB_MODEL_NAME=$TRACKER_PROJECT_NAME

# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
export RESOLUTION=1024
# If you want to have the training data resized by pixel area (Megapixels) rather than edge length,
#  set this value to "area" instead of "pixel", and uncomment the next RESOLUTION declaration.
export RESOLUTION_TYPE="pixel"
#export RESOLUTION=1          # 1.0 Megapixel training sizes
# If RESOLUTION_TYPE="pixel", the minimum resolution specifies the smaller edge length, measured in pixels. Recommended: 1024.
# If RESOLUTION_TYPE="area", the minimum resolution specifies the total image area, measured in megapixels. Recommended: 1.
export MINIMUM_RESOLUTION=$RESOLUTION

# How many decimals to round aspect buckets to.
#export ASPECT_BUCKET_ROUNDING=2

# Use this to append an instance prompt to each caption, used for adding trigger words.
# This has not been tested in SDXL.
#export INSTANCE_PROMPT="lotr style "
# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
export VALIDATION_PROMPT="a photo of Rybina woman"
export VALIDATION_GUIDANCE=5
# You'll want to set this to 0.7 if you are training a terminal SNR model.
export VALIDATION_GUIDANCE_RESCALE=0.2
# How frequently we will save and run a pipeline for validations.
export VALIDATION_STEPS=100
export VALIDATION_NUM_INFERENCE_STEPS=50
export VALIDATION_NEGATIVE_PROMPT="blurry, cropped, ugly"
export VALIDATION_SEED=2
export VALIDATION_RESOLUTION=$RESOLUTION

# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
export TRAIN_BATCH_SIZE=5
# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
export GRADIENT_ACCUMULATION_STEPS=4

# Use any standard scheduler type. constant, polynomial, constant_with_warmup
export LR_SCHEDULE="sine"
# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
# For the cosine or sine type schedules, the warmup period defines the interval between peaks or valleys.
# Use a sine schedule to simulate a warmup period, or a Cosine period to simulate a polynomial start.
export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))

# Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
export CAPTION_DROPOUT_PROBABILITY=0.1

export METADATA_UPDATE_INTERVAL=65
export VAE_BATCH_SIZE=12

# If this is set, any images that fail to open will be DELETED to avoid re-checking them every time.
export DELETE_ERRORED_IMAGES=0
# If this is set, any images that are too small for the minimum resolution size will be DELETED.
export DELETE_SMALL_IMAGES=0

# Bytedance recommends these be set to "trailing" so that inference and training behave in a more congruent manner.
# To follow the original SDXL training strategy, use "leading" instead, though results are generally worse.
export TRAINING_SCHEDULER_TIMESTEP_SPACING="trailing"
export INFERENCE_SCHEDULER_TIMESTEP_SPACING="trailing"

# Removing this option or unsetting it uses vanilla training. Setting it reweights the loss by the position of the timestep in the noise schedule.
# A value "5" is recommended by the researchers. A value of "20" is the least impact, and "1" is the most impact.
export MIN_SNR_GAMMA=5

# Set this to an explicit value of "false" to disable Xformers. Probably required for AMD users.
export USE_XFORMERS=false

# There's basically no reason to unset this. However, to disable it, use an explicit value of "false".
# This will save a lot of memory consumption when enabled.
export USE_GRADIENT_CHECKPOINTING=false

##
# Options below here may require a bit more complicated configuration, so they are not simple variables.
##

# TF32 is great on Ampere or Ada, not sure about earlier generations.
export ALLOW_TF32=false
# AdamW 8Bit is a robust and lightweight choice. Adafactor might reduce memory consumption, and Dadaptation is slow and experimental.
# AdamW is the default optimizer, but it uses a lot of memory and is slower than AdamW8Bit or Adafactor.
# Choices: adamw, adamw8bit, adafactor, dadaptation
export OPTIMIZER="adamw_bf16"

# EMA is a strong regularisation method that uses a lot of extra VRAM to hold two copies of the weights.
# This is worthwhile on large training runs, but not so much for smaller training runs.
export USE_EMA=false
export EMA_DECAY=0.999

export TRAINER_EXTRA_ARGS="--lora_rank=16 --lora_alpha=16 --prediction_type=v_prediction --rescale_betas_zero_snr"
## For offset noise training:
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"

## For terminal SNR training:
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
## You may benefit from directing training toward a specific weighted subset of timesteps.
# In this example, we train the final 25% of the timestep schedule with a 3x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=later --timestep_bias_portion=0.25 --timestep_bias_multiplier=3"
# In this example, we train the earliest 25% of the timestep schedule with a 5x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=earlier --timestep_bias_portion=0.25 --timestep_bias_multiplier=5"
# Here, we designate that specifically, timesteps 200 to 500 should be prioritised.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=range --timestep_bias_begin=200 --timestep_bias_end=500 --timestep_bias_multiplier=3"

## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"

# For Wasabi S3 filesystem backend (experimental)
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"

# Reproducible training. Set to -1 to disable.
export TRAINING_SEED=-1

# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
export MIXED_PRECISION="bf16"                # Might not be supported on all GPUs. fp32 will be needed for others.
export PURE_BF16=True

# This has to be changed if you're training with multiple GPUs.
export TRAINING_NUM_PROCESSES=1
export TRAINING_NUM_MACHINES=1
export ACCELERATE_EXTRA_ARGS=""                          # --multi_gpu or other similar flags for huggingface accelerate

# With Pytorch 2.1, you might have pretty good luck here.
# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
# Well, then again... Pytorch 2.2 has support for dynamic shapes. Why not?
export TRAINING_DYNAMO_BACKEND='no'                # or 'no' if you want to disable torch compile in case of performance issues or lack of support (eg. AMD)

export TOKENIZERS_PARALLELISM=false

Here's how my dataset looks like in folder image (Tried without spaces and etc, still same)

bghira commented 3 months ago

make sure you're not using python 3.12, which it looks like you are

bghira commented 3 months ago

aspect_ratio_bucket_indices.json ends up in the data dir. whatever your value for instance_data_dir is

bghira commented 3 months ago
    "skip_file_discovery": "vae,aspect,text,metadata",

blank this out or remove it from your config file and then try again

pvp-by commented 3 months ago
(py311) ubuntu@Lightest:~/SimpleTuner/datasets/data$ python -V
Python 3.11.9

can you share the first bit of the output for:

jq '.' < aspect_ratio_bucket_indices.json

(py311) ubuntu@Lightest:~/SimpleTuner/datasets/data$ jq . aspect_ratio_bucket_indices.json

{
  "config": {
    "vae_cache_clear_each_epoch": false,
    "probability": 1,
    "repeats": 2,
    "crop": false,
    "crop_aspect": "square",
    "crop_style": "random",
    "disable_validation": false,
    "resolution": 1024,
    "resolution_type": "pixel",
    "caption_strategy": "instanceprompt",
    "instance_data_root": "/home/ubuntu/SimpleTuner/datasets/data",
    "maximum_image_size": null,
    "target_downsample_size": null
  },
  "aspect_ratio_bucket_indices": {
    "1.0": [
      "/home/ubuntu/SimpleTuner/datasets/data/train/6.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/2.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/5.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/10.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/9.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/1.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/8.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/7.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/3.png",
      "/home/ubuntu/SimpleTuner/datasets/data/train/4.png"
    ]
  }
}

multidatabackend.json :

[
    {
        "id": "sd3test",
        "type": "local",
        "probability": 1.0,
        "dataset_type" : "image",
        "repeats": 2,
        "crop": false,
        "resolution": 1024,
        "resolution_type": "pixel",
        "instance_root_dir": "/home/ubuntu/SimpleTuner/datasets/data",
        "instance_data_dir": "/home/ubuntu/SimpleTuner/datasets/data",
        "cache_dir_vae": "/home/ubuntu/SimpleTuner/datasets/cache_image",
        "vae_cache_clear_each_epoch": false,
        "instance_prompt": "sd3test",
        "caption_strategy": "instanceprompt",
        "skip_file_discovery": "",
        "text_embeds": "alt-text-embeds",
        "metadata_backend": "json",
        "preserve_data_backend_cache": true
    },
    {
        "id": "alt-text-embeds",
        "type": "local",
        "dataset_type": "text_embeds",
        "default": true,
        "cache_dir": "/home/ubuntu/SimpleTuner/datasets/cache_text"
  }
]

folder tree :

.
├── .git
├── .github
├── .venv
│   ...
├── data_lora
├── datasets
│   ├── cache_text
│   └── data
│       ├── caption
│       └── train
├── documentation
│   ...
├── helpers
│   ├── __pycache__
....

details folder datasets

(py311) ubuntu@Lightest:~/SimpleTuner/datasets$ tree -L 3 -a
.
├── cache_text
│   └── d41d8cd98f00b204e9800998ecf8427e-sd3.pt
├── data
│   ├── aspect_ratio_bucket_indices.json
│   ├── aspect_ratio_bucket_metadata.json
│   ├── caption
│   │   ├── 1.txt
│   │   ├── 10.txt
│   │   ├── 2.txt
│   │   ├── 3.txt
│   │   ├── 4.txt
│   │   ├── 5.txt
│   │   ├── 6.txt
│   │   ├── 7.txt
│   │   ├── 8.txt
│   │   └── 9.txt
│   └── train
│       ├── 1.png
│       ├── 10.png
│       ├── 2.png
│       ├── 3.png
│       ├── 4.png
│       ├── 5.png
│       ├── 6.png
│       ├── 7.png
│       ├── 8.png
│       └── 9.png
└── multidatabackend.json
bghira commented 3 months ago

for you please try the bugfix/bucket-search branch

or update the instance_data_dir to include the /train subdirectory

pvp-by commented 3 months ago

for you please try the bugfix/bucket-search branch

or update the instance_data_dir to include the /train subdirectory

nothing , full zero :

.
├── cache_text
│   └── d41d8cd98f00b204e9800998ecf8427e-sd3.pt
├── data
│   ├── caption
│   │   ├── 1.txt
│   │   ├── 10.txt
│   │   ├── 2.txt
│   │   ├── 3.txt
│   │   ├── 4.txt
│   │   ├── 5.txt
│   │   ├── 6.txt
│   │   ├── 7.txt
│   │   ├── 8.txt
│   │   └── 9.txt
│   └── train
│       ├── 1.png
│       ├── 10.png
│       ├── 2.png
│       ├── 3.png
│       ├── 4.png
│       ├── 5.png
│       ├── 6.png
│       ├── 7.png
│       ├── 8.png
│       ├── 9.png
│       ├── aspect_ratio_bucket_indices.json
│       └── aspect_ratio_bucket_metadata.json
└── multidatabackend.json

new place for generate. change instance_data_dir and use :

(py311) ubuntu@Lightest:~/SimpleTuner$ git branch
* bugfix/bucket-search

multidatabackend.json

aspect_ratio_bucket_indices.json

aspect_ratio_bucket_metadata.json

OliviaOliveiira commented 3 months ago

make sure you're not using python 3.12, which it looks like you are

I've made a conda environment with python 3.10

    "skip_file_discovery": "vae,aspect,text,metadata",

blank this out or remove it from your config file and then try again

I've removed this line from my multidatabackend.json

aspect_ratio_bucket_indices.json ends up in the data dir. whatever your value for instance_data_dir is

I've launched a training once again and still get the same issue with all the fixes

UPD:

{"config": {"vae_cache_clear_each_epoch": false, "probability": 1.0, "repeats": 0, "crop": "false", "crop_aspect": "square", "crop_style": "random", "disable_validation": false, "resolution": 1024, "resolution_type": "pixel", "caption_strategy": "instanceprompt", "instance_data_root": "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita", "maximum_image_size": null, "target_downsample_size": null}, "aspect_ratio_bucket_indices": {"1.0": ["/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (8).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (10).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (4).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (2).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (1).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (3).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (6).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (7).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (5).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (11).jpg", "/home/miya123/\u0420\u0430\u0431\u043e\u0447\u0438\u0439 \u0441\u0442\u043e\u043b/Huita/SimpleTuner/Base_dir/Huita/huita (9).jpg"]}}
bghira commented 3 months ago

ok, i have a script here that will create an anonymised version of your dataset - the same number of images, filenames, but it will have just black frames instead of images:

import os
from PIL import Image

def create_black_images(src_folder, dst_folder):
    if not os.path.exists(dst_folder):
        os.makedirs(dst_folder)

    for filename in os.listdir(src_folder):
        file_path = os.path.join(src_folder, filename)

        if os.path.isfile(file_path):
            try:
                with Image.open(file_path) as img:
                    black_img = Image.new('RGB', img.size, (0, 0, 0))
                    black_img.save(os.path.join(dst_folder, filename))
                    print(f"Processed {filename}")
            except IOError:
                print(f"Cannot process {filename}. It may not be an image file.")

if __name__ == "__main__":
    src_folder = 'path_to_source_folder'  # Replace with the path to your source folder
    dst_folder = 'path_to_destination_folder'  # Replace with the path to your destination folder

    create_black_images(src_folder, dst_folder)

you'll have to modify the arguments src_folder and dst_folder and then provide a zip or tar copy of the folder, which will help me recreate your environment exactly.

bghira commented 3 months ago

@pvp-by you'll probably have to open a separate issue because yours is rather differently configured.

but in the meantime, please note that your txt files and img data have to be in the same directory.

bghira commented 3 months ago

@OliviaOliveiira this utf8 business in the json files is very suspicious. i hate to ask if you can do it since it's obviously a big change to your system layout but can you possibly try putting the data without a heavy utf8 prefix in the directory path?

pvp-by commented 3 months ago

create #548

but in the meantime, please note that your txt files and img data have to be in the same directory.

zero change.

OliviaOliveiira commented 3 months ago

@OliviaOliveiira this utf8 business in the json files is very suspicious. i hate to ask if you can do it since it's obviously a big change to your system layout but can you possibly try putting the data without a heavy utf8 prefix in the directory path?

nah, should've done that from the very beginning, so I've set my Linux to english and moved everything so there's nothing to interfere with.

Still, no result.

so my aspect_ratio_bucket_indices.json looks like this now

{"config": {"vae_cache_clear_each_epoch": false, "probability": 1.0, "repeats": 0, "crop": "false", "crop_aspect": "square", "crop_style": "random", "disable_validation": false, "resolution": 1024, "resolution_type": "pixel", "caption_strategy": "instanceprompt", "instance_data_root": "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita", "maximum_image_size": null, "target_downsample_size": null}, "aspect_ratio_bucket_indices": {"1.0": ["/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (8).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (3).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (10).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (1).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (2).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (6).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (11).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (7).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (5).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (4).jpg", "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita/huita (9).jpg"]}}

and my dataset config l

[
    {
    "id": "Altushka",
    "type": "local",
    "vae_cache_clear_each_epoch": false, 
    "probability": 1.0, 
    "repeats": 0,
    "crop": "false",
    "resolution": 1024,
    "resolution_type": "pixel",
    "instance_prompt": "Altushka",
    "caption_strategy": "instanceprompt",
    "instance_root_dir": "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita",
    "instance_data_dir": "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Huita",
    "cache_dir_vae": "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/VAECache",
    "text_embeds": "alt-embed-cache",
    "preserve_data_backend_cache": true
        }, 
        {
        "id": "alt-embed-cache",
        "dataset_type": "text_embeds",
        "default": true,
        "type": "local",
        "cache_dir": "/home/miya123/Desktop/Huita/SimpleTuner/Base_dir/Cache_Huita_Kakayata"
    }
]

and my sdxl_env.sh

# Configure these values.

# 'lora' or 'full'
# lora - train a small network for a character or style, or both. quite versatile.
# full - requires lots of vram, trains very slowly, needs a lot of data and concepts.
export MODEL_TYPE='lora'

# Set this to 'true' if you are training a Stable Diffusion 3 checkpoint.
# Use MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
export STABLE_DIFFUSION_3=true
# Similarly, this is to train PixArt Sigma (1K or 2K) models.
# Use MODEL_NAME="PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
export PIXART_SIGMA=false

# ControlNet model training is only supported when MODEL_TYPE='full'
# See this document for more information: https://github.com/bghira/SimpleTuner/blob/main/documentation/CONTROLNET.md
# DeepFloyd, PixArt, and SD3 do not currently support ControlNet model training.
export CONTROLNET=false

# DoRA enhances the training style of LoRA, but it will run more slowly at the same rank.
# See: https://arxiv.org/abs/2402.09353
# See: https://github.com/huggingface/peft/pull/1474
export USE_DORA=false

# BitFit freeze strategy for the u-net causes everything but the biases to be frozen.
# This may help retain the full model's underlying capabilities. LoRA is currently not tested/known to work.
#if [[ "$MODEL_TYPE" == "full" ]]; then
#    # When training a full model, we will rely on BitFit to keep the u-net intact.
#    export USE_BITFIT=true
#elif [[ "$MODEL_TYPE" == "lora" ]]; then
#    # LoRA can not use BitFit.
#    export USE_BITFIT=false
#elif [[ "$MODEL_TYPE" == "deepfloyd-full" ]]; then
#    export USE_BITFIT=true
#fi

# Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
export RESUME_CHECKPOINT="latest"

# How often to checkpoint. Depending on your learning rate, you may wish to change this.
# For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
export CHECKPOINTING_STEPS=500
# This is how many checkpoints we will keep. Two is safe, but three is safer.
export CHECKPOINTING_LIMIT=6

# This is decided as a relatively conservative 'constant' learning rate.
# Adjust higher or lower depending on how burnt your model becomes.
export LEARNING_RATE=0.0001 #@param {type:"number"}

# Using a Huggingface Hub model:
export MODEL_NAME="/media/miya123/UUI/Users/USER01/.cache/huggingface/hub/models--stabilityai--stable-diffusion-3-medium-diffusers/snapshots/ea42f8cef0f178587cf766dc8129abd379c90671"
# Using a local path to a huggingface hub model or saved checkpoint:
#export MODEL_NAME="/datasets/models/pipeline"

# Make DEBUG_EXTRA_ARGS empty to disable wandb.
export DEBUG_EXTRA_ARGS=""
export TRACKER_PROJECT_NAME="sd3-training"
export TRACKER_RUN_NAME="sd3-lora-ALTUSHKA"

# Max number of steps OR epochs can be used. Not both.
export MAX_NUM_STEPS=3500
# Will likely overtrain, but that's fine.
export NUM_EPOCHS=0

# A convenient prefix for all of your training paths.
export BASE_DIR="/home/miya123/Desktop/Huita/SimpleTuner/Base_dir"
export DATALOADER_CONFIG="${BASE_DIR}/multidatabackend.json"
export OUTPUT_DIR="${BASE_DIR}/models"
# Set this to "true" to push your model to Hugging Face Hub.
export PUSH_TO_HUB="false"
# If PUSH_TO_HUB and PUSH_CHECKPOINTS are both enabled, every saved checkpoint will be pushed to Hugging Face Hub.
export PUSH_CHECKPOINTS="false"
# This will be the model name for your final hub upload, eg. "yourusername/yourmodelname"
# It defaults to the wandb project name, but you can override this here.
export HUB_MODEL_NAME=$TRACKER_PROJECT_NAME

# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
export RESOLUTION=1024
# If you want to have the training data resized by pixel area (Megapixels) rather than edge length,
#  set this value to "area" instead of "pixel", and uncomment the next RESOLUTION declaration.
export RESOLUTION_TYPE="pixel"
#export RESOLUTION=1          # 1.0 Megapixel training sizes
# If RESOLUTION_TYPE="pixel", the minimum resolution specifies the smaller edge length, measured in pixels. Recommended: 1024.
# If RESOLUTION_TYPE="area", the minimum resolution specifies the total image area, measured in megapixels. Recommended: 1.
export MINIMUM_RESOLUTION=$RESOLUTION

# How many decimals to round aspect buckets to.
#export ASPECT_BUCKET_ROUNDING=2

# Use this to append an instance prompt to each caption, used for adding trigger words.
# This has not been tested in SDXL.
#export INSTANCE_PROMPT="lotr style "
# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
export VALIDATION_PROMPT="a photo of Rybina woman"
export VALIDATION_GUIDANCE=5
# You'll want to set this to 0.7 if you are training a terminal SNR model.
export VALIDATION_GUIDANCE_RESCALE=0.2
# How frequently we will save and run a pipeline for validations.
export VALIDATION_STEPS=100
export VALIDATION_NUM_INFERENCE_STEPS=50
export VALIDATION_NEGATIVE_PROMPT="blurry, cropped, ugly"
export VALIDATION_SEED=2
export VALIDATION_RESOLUTION=$RESOLUTION

# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
export TRAIN_BATCH_SIZE=5
# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
export GRADIENT_ACCUMULATION_STEPS=4

# Use any standard scheduler type. constant, polynomial, constant_with_warmup
export LR_SCHEDULE="sine"
# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
# For the cosine or sine type schedules, the warmup period defines the interval between peaks or valleys.
# Use a sine schedule to simulate a warmup period, or a Cosine period to simulate a polynomial start.
export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))

# Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
export CAPTION_DROPOUT_PROBABILITY=0.1

export METADATA_UPDATE_INTERVAL=65
export VAE_BATCH_SIZE=12

# If this is set, any images that fail to open will be DELETED to avoid re-checking them every time.
export DELETE_ERRORED_IMAGES=0
# If this is set, any images that are too small for the minimum resolution size will be DELETED.
export DELETE_SMALL_IMAGES=0

# Bytedance recommends these be set to "trailing" so that inference and training behave in a more congruent manner.
# To follow the original SDXL training strategy, use "leading" instead, though results are generally worse.
export TRAINING_SCHEDULER_TIMESTEP_SPACING="trailing"
export INFERENCE_SCHEDULER_TIMESTEP_SPACING="trailing"

# Removing this option or unsetting it uses vanilla training. Setting it reweights the loss by the position of the timestep in the noise schedule.
# A value "5" is recommended by the researchers. A value of "20" is the least impact, and "1" is the most impact.
export MIN_SNR_GAMMA=5

# Set this to an explicit value of "false" to disable Xformers. Probably required for AMD users.
export USE_XFORMERS=false

# There's basically no reason to unset this. However, to disable it, use an explicit value of "false".
# This will save a lot of memory consumption when enabled.
export USE_GRADIENT_CHECKPOINTING=false

##
# Options below here may require a bit more complicated configuration, so they are not simple variables.
##

# TF32 is great on Ampere or Ada, not sure about earlier generations.
export ALLOW_TF32=false
# AdamW 8Bit is a robust and lightweight choice. Adafactor might reduce memory consumption, and Dadaptation is slow and experimental.
# AdamW is the default optimizer, but it uses a lot of memory and is slower than AdamW8Bit or Adafactor.
# Choices: adamw, adamw8bit, adafactor, dadaptation
export OPTIMIZER="adamw_bf16"

# EMA is a strong regularisation method that uses a lot of extra VRAM to hold two copies of the weights.
# This is worthwhile on large training runs, but not so much for smaller training runs.
export USE_EMA=false
export EMA_DECAY=0.999

export TRAINER_EXTRA_ARGS="--lora_rank=16 --lora_alpha=16 --prediction_type=v_prediction --rescale_betas_zero_snr"
## For offset noise training:
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"

## For terminal SNR training:
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
## You may benefit from directing training toward a specific weighted subset of timesteps.
# In this example, we train the final 25% of the timestep schedule with a 3x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=later --timestep_bias_portion=0.25 --timestep_bias_multiplier=3"
# In this example, we train the earliest 25% of the timestep schedule with a 5x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=earlier --timestep_bias_portion=0.25 --timestep_bias_multiplier=5"
# Here, we designate that specifically, timesteps 200 to 500 should be prioritised.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=range --timestep_bias_begin=200 --timestep_bias_end=500 --timestep_bias_multiplier=3"

## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"

# For Wasabi S3 filesystem backend (experimental)
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"

# Reproducible training. Set to -1 to disable.
export TRAINING_SEED=-1

# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
export MIXED_PRECISION="bf16"                # Might not be supported on all GPUs. fp32 will be needed for others.
export PURE_BF16=True

# This has to be changed if you're training with multiple GPUs.
export TRAINING_NUM_PROCESSES=1
export TRAINING_NUM_MACHINES=1
export ACCELERATE_EXTRA_ARGS=""                          # --multi_gpu or other similar flags for huggingface accelerate

# With Pytorch 2.1, you might have pretty good luck here.
# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
# Well, then again... Pytorch 2.2 has support for dynamic shapes. Why not?
export TRAINING_DYNAMO_BACKEND='no'                # or 'no' if you want to disable torch compile in case of performance issues or lack of support (eg. AMD)

export TOKENIZERS_PARALLELISM=false
OliviaOliveiira commented 3 months ago

ok, i have a script here that will create an anonymised version of your dataset - the same number of images, filenames, but it will have just black frames instead of images:

import os
from PIL import Image

def create_black_images(src_folder, dst_folder):
    if not os.path.exists(dst_folder):
        os.makedirs(dst_folder)

    for filename in os.listdir(src_folder):
        file_path = os.path.join(src_folder, filename)

        if os.path.isfile(file_path):
            try:
                with Image.open(file_path) as img:
                    black_img = Image.new('RGB', img.size, (0, 0, 0))
                    black_img.save(os.path.join(dst_folder, filename))
                    print(f"Processed {filename}")
            except IOError:
                print(f"Cannot process {filename}. It may not be an image file.")

if __name__ == "__main__":
    src_folder = 'path_to_source_folder'  # Replace with the path to your source folder
    dst_folder = 'path_to_destination_folder'  # Replace with the path to your destination folder

    create_black_images(src_folder, dst_folder)

you'll have to modify the arguments src_folder and dst_folder and then provide a zip or tar copy of the folder, which will help me recreate your environment exactly.

test.zip

OliviaOliveiira commented 3 months ago

Moreover, I'm on bugfix/bucket-search branch as well now

bghira commented 3 months ago

same issue for both of you, as it seems TRAIN_BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS = 20 when you have <= 10 images in the set.

OliviaOliveiira commented 3 months ago

same issue for both of you, as it seems TRAIN_BATCH_SIZE * GRADIENT_ACCUMULATION_STEPS = 20 when you have <= 10 images in the set.

Fixed it and now the lora works just as fine as it does on XL