Closed MohamedAliRashad closed 5 months ago
what are you doing that led to this condition?
@bghira
Nothing, This is my multidatabackend.json
[
{
"id": "pseudo-camera-10k-sd3",
"type": "local",
"crop": true,
"crop_aspect": "square",
"crop_style": "center",
"resolution": 0.5,
"minimum_image_size": 0.25,
"maximum_image_size": 1.0,
"target_downsample_size": 1.0,
"resolution_type": "area",
"cache_dir_vae": "cache/vae/sd3/pseudo-camera-10k",
"instance_data_dir": "outputs/datasets/pseudo-camera-10k",
"disabled": false,
"skip_file_discovery": "",
"caption_strategy": "filename",
"metadata_backend": "json"
},
{
"id": "text-embeds",
"type": "local",
"dataset_type": "text_embeds",
"default": true,
"cache_dir": "cache/text/sd3/pseudo-camera-10k",
"disabled": false,
"write_batch_size": 128
}
]
And this is my sdxl-env.sh
# Configure these values.
# 'lora' or 'full'
# lora - train a small network for a character or style, or both. quite versatile.
# full - requires lots of vram, trains very slowly, needs a lot of data and concepts.
export MODEL_TYPE='full'
# Set this to 'true' if you are training a Stable Diffusion 3 checkpoint.
# Use MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
export STABLE_DIFFUSION_3=true
# Similarly, this is to train PixArt Sigma (1K or 2K) models.
# Use MODEL_NAME="PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
export PIXART_SIGMA=false
# ControlNet model training is only supported when MODEL_TYPE='full'
# See this document for more information: https://github.com/bghira/SimpleTuner/blob/main/documentation/CONTROLNET.md
# DeepFloyd, PixArt, and SD3 do not currently support ControlNet model training.
export CONTROLNET=false
# DoRA enhances the training style of LoRA, but it will run more slowly at the same rank.
# See: https://arxiv.org/abs/2402.09353
# See: https://github.com/huggingface/peft/pull/1474
export USE_DORA=false
# BitFit freeze strategy for the u-net causes everything but the biases to be frozen.
# This may help retain the full model's underlying capabilities. LoRA is currently not tested/known to work.
if [[ "$MODEL_TYPE" == "full" ]]; then
# When training a full model, we will rely on BitFit to keep the u-net intact.
export USE_BITFIT=true
elif [[ "$MODEL_TYPE" == "lora" ]]; then
# As of v0.9.2 of SimpleTuner, LoRA can not use BitFit.
export USE_BITFIT=false
elif [[ "$MODEL_TYPE" == "deepfloyd-full" ]]; then
export USE_BITFIT=true
fi
# Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
export RESUME_CHECKPOINT="latest"
# How often to checkpoint. Depending on your learning rate, you may wish to change this.
# For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
export CHECKPOINTING_STEPS=150
# This is how many checkpoints we will keep. Two is safe, but three is safer.
export CHECKPOINTING_LIMIT=2
# This is decided as a relatively conservative 'constant' learning rate.
# Adjust higher or lower depending on how burnt your model becomes.
export LEARNING_RATE=8e-7 #@param {type:"number"}
# Using a Huggingface Hub model:
export MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
# Using a local path to a huggingface hub model or saved checkpoint:
#export MODEL_NAME="/datasets/models/pipeline"
# Make DEBUG_EXTRA_ARGS empty to disable wandb.
export DEBUG_EXTRA_ARGS="--report_to=tensorboard"
export TRACKER_PROJECT_NAME="sd3-dummy-training"
export TRACKER_RUN_NAME="simpletuner-sdxl"
# Max number of steps OR epochs can be used. Not both.
export MAX_NUM_STEPS=30000
# Will likely overtrain, but that's fine.
export NUM_EPOCHS=0
# A convenient prefix for all of your training paths.
export BASE_DIR="/shared_volume/development/text_to_image/SimpleTuner/outputs"
export DATALOADER_CONFIG="${BASE_DIR}/multidatabackend.json"
export OUTPUT_DIR="${BASE_DIR}/models"
# Set this to "true" to push your model to Hugging Face Hub.
export PUSH_TO_HUB="false"
# If PUSH_TO_HUB and PUSH_CHECKPOINTS are both enabled, every saved checkpoint will be pushed to Hugging Face Hub.
export PUSH_CHECKPOINTS="false"
# This will be the model name for your final hub upload, eg. "yourusername/yourmodelname"
# It defaults to the wandb project name, but you can override this here.
export HUB_MODEL_NAME=$TRACKER_PROJECT_NAME
# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
# export RESOLUTION=1024
# export RESOLUTION_TYPE="pixel"
# If you want to have the training data resized by pixel area (Megapixels) rather than edge length,
# set this value to "area" instead of "pixel", and uncomment the next RESOLUTION declaration.
export RESOLUTION=1.0 # 1.0 Megapixel training sizes
export RESOLUTION_TYPE="area"
# If RESOLUTION_TYPE="pixel", the minimum resolution specifies the smaller edge length, measured in pixels. Recommended: 1024.
# If RESOLUTION_TYPE="area", the minimum resolution specifies the total image area, measured in megapixels. Recommended: 1.
export MINIMUM_RESOLUTION=$RESOLUTION
# How many decimals to round aspect buckets to.
#export ASPECT_BUCKET_ROUNDING=2
# Use this to append an instance prompt to each caption, used for adding trigger words.
# This has not been tested in SDXL.
#export INSTANCE_PROMPT="lotr style "
# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
export VALIDATION_PROMPT="ethnographic photography of teddy bear at a picnic"
export VALIDATION_GUIDANCE=7.5
# You'll want to set this to 0.7 if you are training a terminal SNR model.
export VALIDATION_GUIDANCE_RESCALE=0.0
# How frequently we will save and run a pipeline for validations.
export VALIDATION_STEPS=100
export VALIDATION_NUM_INFERENCE_STEPS=30
export VALIDATION_NEGATIVE_PROMPT="blurry, cropped, ugly"
export VALIDATION_SEED=42
export VALIDATION_RESOLUTION=$RESOLUTION
# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
export TRAIN_BATCH_SIZE=8
# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
export GRADIENT_ACCUMULATION_STEPS=4
# Use any standard scheduler type. constant, polynomial, constant_with_warmup
export LR_SCHEDULE="sine"
# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
# For the cosine or sine type schedules, the warmup period defines the interval between peaks or valleys.
# Use a sine schedule to simulate a warmup period, or a Cosine period to simulate a polynomial start.
#export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))
export LR_WARMUP_STEPS=1000
# Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
export CAPTION_DROPOUT_PROBABILITY=0.1
export METADATA_UPDATE_INTERVAL=65
export VAE_BATCH_SIZE=12
# If this is set, any images that fail to open will be DELETED to avoid re-checking them every time.
export DELETE_ERRORED_IMAGES=0
# If this is set, any images that are too small for the minimum resolution size will be DELETED.
export DELETE_SMALL_IMAGES=0
# Bytedance recommends these be set to "trailing" so that inference and training behave in a more congruent manner.
# To follow the original SDXL training strategy, use "leading" instead, though results are generally worse.
export TRAINING_SCHEDULER_TIMESTEP_SPACING="trailing"
export INFERENCE_SCHEDULER_TIMESTEP_SPACING="trailing"
# Removing this option or unsetting it uses vanilla training. Setting it reweights the loss by the position of the timestep in the noise schedule.
# A value "5" is recommended by the researchers. A value of "20" is the least impact, and "1" is the most impact.
export MIN_SNR_GAMMA=5
# Set this to an explicit value of "false" to disable Xformers. Probably required for AMD users.
export USE_XFORMERS=false
# There's basically no reason to unset this. However, to disable it, use an explicit value of "false".
# This will save a lot of memory consumption when enabled.
export USE_GRADIENT_CHECKPOINTING=false
##
# Options below here may require a bit more complicated configuration, so they are not simple variables.
##
# TF32 is great on Ampere or Ada, not sure about earlier generations.
export ALLOW_TF32=true
# AdamW 8Bit is a robust and lightweight choice. Adafactor might reduce memory consumption, and Dadaptation is slow and experimental.
# AdamW is the default optimizer, but it uses a lot of memory and is slower than AdamW8Bit or Adafactor.
# Choices: adamw, adamw8bit, adafactor, dadaptation
export OPTIMIZER="adamw_bf16"
# EMA is a strong regularisation method that uses a lot of extra VRAM to hold two copies of the weights.
# This is worthwhile on large training runs, but not so much for smaller training runs.
export USE_EMA=false
export EMA_DECAY=0.999
export TRAINER_EXTRA_ARGS=""
## For offset noise training:
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"
## For terminal SNR training:
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
## You may benefit from directing training toward a specific weighted subset of timesteps.
# In this example, we train the final 25% of the timestep schedule with a 3x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=later --timestep_bias_portion=0.25 --timestep_bias_multiplier=3"
# In this example, we train the earliest 25% of the timestep schedule with a 5x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=earlier --timestep_bias_portion=0.25 --timestep_bias_multiplier=5"
# Here, we designate that specifically, timesteps 200 to 500 should be prioritised.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=range --timestep_bias_begin=200 --timestep_bias_end=500 --timestep_bias_multiplier=3"
## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"
# For Wasabi S3 filesystem backend (experimental)
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"
# Reproducible training. Set to -1 to disable.
export TRAINING_SEED=42
# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
export MIXED_PRECISION="bf16" # Might not be supported on all GPUs. fp32 will be needed for others.
export PURE_BF16=true
# This has to be changed if you're training with multiple GPUs.
export TRAINING_NUM_PROCESSES=2
export TRAINING_NUM_MACHINES=1
export ACCELERATE_EXTRA_ARGS="--multi_gpu" # --multi_gpu or other similar flags for huggingface accelerate
# With Pytorch 2.1, you might have pretty good luck here.
# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
# Well, then again... Pytorch 2.2 has support for dynamic shapes. Why not?
export TRAINING_DYNAMO_BACKEND='no' # or 'no' if you want to disable torch compile in case of performance issues or lack of support (eg. AMD)
export TOKENIZERS_PARALLELISM=false
Also while you are here, i am getting this error when trying to use jsonl
2024-06-18 04:03:44,825 [ERROR] (__main__) Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file., traceback: Traceback (most recent call last):
File "/shared_volume/development/text_to_image/SimpleTuner/train_sdxl.py", line 697, in main
configure_multi_databackend(
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/data_backend/factory.py", line 629, in configure_multi_databackend
configure_parquet_database(backend, args, init_backend["data_backend"])
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/data_backend/factory.py", line 195, in configure_parquet_database
df = pd.read_parquet(pq)
^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/pandas/io/parquet.py", line 667, in read_parquet
return impl.read(
^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/pandas/io/parquet.py", line 274, in read
pa_table = self.api.parquet.read_table(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/pyarrow/parquet/core.py", line 1776, in read_table
dataset = ParquetDataset(
^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/pyarrow/parquet/core.py", line 1343, in __init__
[fragment], schema=schema or fragment.physical_schema,
^^^^^^^^^^^^^^^^^^^^^^^^
File "pyarrow/_dataset.pyx", line 1367, in pyarrow._dataset.Fragment.physical_schema.__get__
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
If you can help me with it ^^
@bghira I think i know where the problem is
In helpers/training/validation.py
You expect validation resolution to be an integer (e.g. 1024, 768) not a float with resolution_type="area"
... That what caused the issue.
You may want to make this function to support area
resolution_type:
https://github.com/bghira/SimpleTuner/blob/239c1ce0da496126bfd56de6576103fc37022266/helpers/training/validation.py#L69
@bghira Regarding the other problem (the title of the issue) https://github.com/bghira/SimpleTuner/blob/239c1ce0da496126bfd56de6576103fc37022266/helpers/training/validation.py#L371
Here, you use self.validation_negative_prompt_mask
and it only get set in here:
https://github.com/bghira/SimpleTuner/blob/239c1ce0da496126bfd56de6576103fc37022266/helpers/training/validation.py#L309
which means it only gets a value if pixart_sigma
is enabeled and the inside if condition is met.
oh, you are saying that it returns no resolution when VALIDATION_RESOLUTION=1.0 ?
@bghira Yes
in helpers/arguments.py
we see the following:
if (
args.validation_resolution.isdigit()
and int(args.validation_resolution) < 128
and "deepfloyd" not in args.model_type
):
# Convert from megapixels to pixels:
log_msg = f"It seems that --validation_resolution was given in megapixels ({args.validation_resolution}). Converting to pixel measurement:"
if int(args.validation_resolution) == 1:
args.validation_resolution = 1024
else:
args.validation_resolution = int(int(args.validation_resolution) * 1e3)
# Make it divisible by 8:
args.validation_resolution = int(int(args.validation_resolution) / 8) * 8
logger.info(f"{log_msg} {int(args.validation_resolution)}px")
do you see that message in your console output at all? you can check debug.log
which will contain the logs from your previous run
@bghira I can't find it, the trainer keeps continuing from my last checkpoint and i don't think my change of resolution
and resolution_type
is making any difference.
i guess when i wrote that initially, i lazily assumed that floats would work with isdigit()
- but i've now added a correction for this handling to ensure it detects either an integer or float value and handles the conversion to the correct value automatically. can you check that?
because the vae cache is such a volatile creature when you change fundamental things like resolution / resolution_type, these are prevented from being changed after the dataset's initial processing.
see the log output during startup where it mentions --override_dataset_config
of course if you change these you'll have to remove the vae cache objects, the aspect bucket json files from the dataset.
i am not sure if you'll have to remove the aspect ratio mapping files from the output dir, everytime i think about it i come to the same conclusion that it's a good thing to just leave that consistent between runs.
@bghira I agree with you on this.
I get this message when i continued from a last checkpoint but with the change of resolution to 1.0:
Validation resolution is not supported for this model type.
I will try your fix now
@bghira Your fix worked, what is left now is the validation_negative_prompt_mask
bug
i've pushed one for that too now
@bghira The old error disappeared, but a new error emerged:
TypeError: StableDiffusion3Pipeline.__call__() got an unexpected keyword argument 'prompt_mask'
ah. okay. some of the pipelines are so much more sensitive to extra kwargs 🤦
that is now fixed.
@bghira New error:
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 719, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 928, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 292, in forward
hidden_states = self.pos_embed(hidden_states) # takes care of adding positional embeddings too.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 208, in forward
latent = self.proj(latent)
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
do you have the debug logs for that section?
This is the full debug.log
file
2024-06-18 17:38:02,148 [WARNING] (ArgsParser) Stable Diffusion 3 requires a pixel alignment interval of 64px. Updating value.
2024-06-18 17:38:02,149 [INFO] (ArgsParser) It seems that --validation_resolution was given in megapixels (1.0). Converting to pixel measurement: 1024px
2024-06-18 17:38:02,149 [WARNING] (ArgsParser) Disabling Compel long-prompt weighting for SD3 inference, as it does not support Stable Diffusion 3.
2024-06-18 17:38:02,186 [INFO] (__main__) Enabling tf32 precision boost for NVIDIA devices due to --allow_tf32.
2024-06-18 17:38:02,187 [INFO] (__main__) Load tokenizers
2024-06-18 17:38:03,277 [INFO] (__main__) Load OpenAI CLIP-L/14 text encoder..
2024-06-18 17:38:03,795 [INFO] (__main__) Loading T5-XXL v1.1 text encoder from stabilityai/stable-diffusion-3-medium-diffusers/text_encoder..
2024-06-18 17:38:04,362 [INFO] (__main__) Loading LAION OpenCLIP-G/14 text encoder..
2024-06-18 17:38:06,202 [INFO] (__main__) Loading T5-XXL v1.1 text encoder..
2024-06-18 17:38:16,776 [INFO] (__main__) Load VAE..
2024-06-18 17:38:17,204 [INFO] (__main__) Moving models to GPU. Almost there.
2024-06-18 17:38:27,819 [INFO] (__main__) Loading Stable Diffusion 3 diffusion transformer..
2024-06-18 17:38:33,875 [INFO] (__main__) Moving the diffusion transformer to GPU in torch.bfloat16 precision.
2024-06-18 17:38:37,026 [INFO] (__main__) Initialising VAE in bf16 precision, you may specify a different value if preferred: bf16, fp32, default
2024-06-18 17:38:37,114 [INFO] (__main__) Loaded VAE into VRAM.
2024-06-18 17:38:37,114 [INFO] (DataBackendFactory) Loading data backend config from /shared_volume/development/text_to_image/SimpleTuner/outputs/multidatabackend.json
2024-06-18 17:38:37,115 [INFO] (DataBackendFactory) Configuring text embed backend: text-embeds
2024-06-18 17:38:37,116 [INFO] (TextEmbeddingCache) (Rank: 0) (id=text-embeds) Listing all text embed cache entries
2024-06-18 17:38:37,366 [INFO] (TextEmbeddingCache) (Rank: 1) (id=text-embeds) Listing all text embed cache entries
2024-06-18 17:38:38,103 [INFO] (DataBackendFactory) Pre-computing null embedding
2024-06-18 17:38:43,456 [INFO] (DataBackendFactory) Completed loading text embed services.
2024-06-18 17:38:43,456 [INFO] (DataBackendFactory) Configuring data backend: pseudo-camera-10k-sd3
2024-06-18 17:38:43,457 [INFO] (DataBackendFactory) Configured backend: {'id': 'pseudo-camera-10k-sd3', 'config': {'crop': True, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 0.5, 'resolution_type': 'area', 'caption_strategy': 'filename', 'instance_data_root': 'outputs/datasets/pseudo-camera-10k', 'maximum_image_size': 1.0, 'target_downsample_size': 1.0}, 'dataset_type': 'image'}
2024-06-18 17:38:43,457 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-sd3) Loading bucket manager.
2024-06-18 17:38:43,459 [INFO] (JsonMetadataBackend) Checking for cache file: outputs/datasets/pseudo-camera-10k/aspect_ratio_bucket_indices.json
2024-06-18 17:38:43,459 [INFO] (JsonMetadataBackend) Checking for cache file: outputs/datasets/pseudo-camera-10k/aspect_ratio_bucket_indices.json
2024-06-18 17:38:43,460 [INFO] (JsonMetadataBackend) Pulling cache file from storage
2024-06-18 17:38:43,460 [INFO] (JsonMetadataBackend) Pulling cache file from storage
2024-06-18 17:38:43,477 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-sd3) Refreshing aspect buckets on main process.
2024-06-18 17:38:43,477 [INFO] (BaseMetadataBackend) Discovering new files...
2024-06-18 17:38:52,186 [INFO] (BaseMetadataBackend) Compressed 14102 existing files from 1.
2024-06-18 17:38:52,186 [INFO] (BaseMetadataBackend) No new files discovered. Doing nothing.
2024-06-18 17:38:52,186 [INFO] (BaseMetadataBackend) Statistics: {'total_processed': 0, 'skipped': {'already_exists': 14102, 'metadata_missing': 0, 'not_found': 0, 'too_small': 0, 'other': 0}}
2024-06-18 17:38:52,238 [INFO] (JsonMetadataBackend) Checking for cache file: outputs/datasets/pseudo-camera-10k/aspect_ratio_bucket_indices.json
2024-06-18 17:38:52,238 [INFO] (JsonMetadataBackend) Pulling cache file from storage
2024-06-18 17:38:52,244 [INFO] (DataBackendFactory) Configured backend: {'id': 'pseudo-camera-10k-sd3', 'config': {'crop': True, 'crop_aspect': 'square', 'crop_aspect_buckets': None, 'crop_style': 'center', 'disable_validation': False, 'resolution': 0.5, 'resolution_type': 'area', 'caption_strategy': 'filename', 'instance_data_root': 'outputs/datasets/pseudo-camera-10k', 'maximum_image_size': 1.0, 'target_downsample_size': 1.0}, 'dataset_type': 'image', 'data_backend': <helpers.data_backend.local.LocalDataBackend object at 0x7f0d82e3a4d0>, 'instance_data_root': 'outputs/datasets/pseudo-camera-10k', 'metadata_backend': <helpers.metadata.backends.json.JsonMetadataBackend object at 0x7f0d82e3a610>}
2024-06-18 17:38:52,246 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-sd3) Collecting captions.
2024-06-18 17:38:52,313 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-sd3) Initialise text embed pre-computation using the filename caption strategy. We have 14102 captions to process.
2024-06-18 17:38:53,299 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-sd3) Completed processing 14102 captions.
2024-06-18 17:38:53,300 [INFO] (DataBackendFactory) (id=pseudo-camera-10k-sd3) Creating VAE latent cache.
2024-06-18 17:38:54,369 [INFO] (validation) Precomputing the negative prompt embed for validations.
2024-06-18 17:38:55,208 [INFO] (validation) Precomputing the negative prompt embed for validations.
2024-06-18 17:38:55,356 [INFO] (__main__) Unloading text encoders, as they are not being trained.
2024-06-18 17:38:55,497 [INFO] (__main__) After nuking text encoders from orbit, we freed 0.0 GB of VRAM. The real memories were the friends we trained a model on along the way.
2024-06-18 17:38:55,498 [INFO] (__main__) Collected the following data backends: ['text-embeds', 'pseudo-camera-10k-sd3']
2024-06-18 17:38:55,498 [INFO] (__main__) Loading sine learning rate scheduler with 1000 warmup steps
2024-06-18 17:38:55,499 [WARNING] (__main__) Training Diffusion transformer models with BitFit is not yet tested, and unexpected results may occur.
2024-06-18 17:38:55,501 [INFO] (__main__) Learning rate: 8e-07
2024-06-18 17:38:55,501 [INFO] (__main__) Using bf16 AdamW optimizer with stochastic rounding.
2024-06-18 17:38:55,507 [INFO] (__main__) Optimizer arguments, weight_decay=0.01 eps=1e-08, extra_arguments={'weight_decay': 0.01, 'eps': 1e-08, 'betas': (0.9, 0.999), 'lr': 8e-07}
2024-06-18 17:38:55,509 [INFO] (__main__) Loading sine learning rate scheduler with 1000 warmup steps
2024-06-18 17:38:55,509 [INFO] (__main__) Using Sine learning rate scheduler.
2024-06-18 17:38:55,512 [INFO] (__main__) Loading our accelerator...
2024-06-18 17:38:55,564 [INFO] (__main__) After removing any undesired samples and updating cache entries, we have settled on 137 epochs and 220 steps per epoch.
2024-06-18 17:38:55,633 [INFO] (__main__) Resuming from checkpoint checkpoint-3000
2024-06-18 17:38:55,633 [INFO] (SDXLSaveHook) Unloading text encoders for full SD3 training without --train_text_encoder
2024-06-18 17:38:55,633 [INFO] (SDXLSaveHook) Unloading text encoders for full SD3 training without --train_text_encoder
2024-06-18 17:39:01,692 [INFO] (MultiAspectSampler-pseudo-camera-10k-sd3) Previous checkpoint had 0 exhausted buckets.
2024-06-18 17:39:01,693 [INFO] (MultiAspectSampler-pseudo-camera-10k-sd3) Previous checkpoint was on epoch 14.
2024-06-18 17:39:01,693 [INFO] (MultiAspectSampler-pseudo-camera-10k-sd3) Previous checkpoint had 4480 seen images.
2024-06-18 17:39:01,694 [INFO] (__main__) Resuming from global_step 3000.
2024-06-18 17:39:01,696 [INFO] (MultiAspectSampler-pseudo-camera-10k-sd3)
(Rank: 0) -> Number of seen images: 4480
(Rank: 0) -> Number of unseen images: 2560
(Rank: 0) -> Current Bucket: None
(Rank: 0) -> 1 Buckets: ['1.0']
(Rank: 0) -> 0 Exhausted Buckets: []
2024-06-18 17:39:01,912 [INFO] (__main__)
***** Running training *****
- Num batches = 880
- Num Epochs = 137
- Current Epoch = 14
- Total train batch size (w. parallel, distributed & accumulation) = 64
- Instantaneous batch size per device = 8
- Gradient Accumulation steps = 4
- Total optimization steps = 30000
- Steps completed: 3000
- Total optimization steps remaining = 27000
2024-06-18 17:44:26,140 [ERROR] (helpers.training.validation) Error generating validation image: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor, Traceback (most recent call last):
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 719, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 928, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 292, in forward
hidden_states = self.pos_embed(hidden_states) # takes care of adding positional embeddings too.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 208, in forward
latent = self.proj(latent)
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
2024-06-18 17:44:26,175 [ERROR] (helpers.training.validation) Error generating validation image: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor, Traceback (most recent call last):
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 719, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 928, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 292, in forward
hidden_states = self.pos_embed(hidden_states) # takes care of adding positional embeddings too.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 208, in forward
latent = self.proj(latent)
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
2024-06-18 17:51:24,680 [ERROR] (helpers.training.validation) Error generating validation image: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor, Traceback (most recent call last):
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 719, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 928, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 292, in forward
hidden_states = self.pos_embed(hidden_states) # takes care of adding positional embeddings too.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 208, in forward
latent = self.proj(latent)
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
2024-06-18 17:51:24,718 [ERROR] (helpers.training.validation) Error generating validation image: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor, Traceback (most recent call last):
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 719, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 928, in __call__
noise_pred = self.transformer(
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/transformers/transformer_sd3.py", line 292, in forward
hidden_states = self.pos_embed(hidden_states) # takes care of adding positional embeddings too.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/models/embeddings.py", line 208, in forward
latent = self.proj(latent)
^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Input type (CPUBFloat16Type) and weight type (CUDABFloat16Type) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
@bghira
if you set SIMPLETUNER_LOG_LEVEL=DEBUG
in the env file we will see more
@bghira Added it and no further information was printed to me, This is the full sdxl-env.sh
# Configure these values.
# 'lora' or 'full'
# lora - train a small network for a character or style, or both. quite versatile.
# full - requires lots of vram, trains very slowly, needs a lot of data and concepts.
export MODEL_TYPE='full'
# Set this to 'true' if you are training a Stable Diffusion 3 checkpoint.
# Use MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
export STABLE_DIFFUSION_3=true
# Similarly, this is to train PixArt Sigma (1K or 2K) models.
# Use MODEL_NAME="PixArt-alpha/PixArt-Sigma-XL-2-1024-MS"
export PIXART_SIGMA=false
# ControlNet model training is only supported when MODEL_TYPE='full'
# See this document for more information: https://github.com/bghira/SimpleTuner/blob/main/documentation/CONTROLNET.md
# DeepFloyd, PixArt, and SD3 do not currently support ControlNet model training.
export CONTROLNET=false
# DoRA enhances the training style of LoRA, but it will run more slowly at the same rank.
# See: https://arxiv.org/abs/2402.09353
# See: https://github.com/huggingface/peft/pull/1474
export USE_DORA=false
# BitFit freeze strategy for the u-net causes everything but the biases to be frozen.
# This may help retain the full model's underlying capabilities. LoRA is currently not tested/known to work.
if [[ "$MODEL_TYPE" == "full" ]]; then
# When training a full model, we will rely on BitFit to keep the u-net intact.
export USE_BITFIT=true
elif [[ "$MODEL_TYPE" == "lora" ]]; then
# As of v0.9.2 of SimpleTuner, LoRA can not use BitFit.
export USE_BITFIT=false
elif [[ "$MODEL_TYPE" == "deepfloyd-full" ]]; then
export USE_BITFIT=true
fi
# Restart where we left off. Change this to "checkpoint-1234" to start from a specific checkpoint.
export RESUME_CHECKPOINT="latest"
# How often to checkpoint. Depending on your learning rate, you may wish to change this.
# For the default settings with 10 gradient accumulations, more frequent checkpoints might be preferable at first.
export CHECKPOINTING_STEPS=150
# This is how many checkpoints we will keep. Two is safe, but three is safer.
export CHECKPOINTING_LIMIT=2
# This is decided as a relatively conservative 'constant' learning rate.
# Adjust higher or lower depending on how burnt your model becomes.
export LEARNING_RATE=8e-7 #@param {type:"number"}
# Using a Huggingface Hub model:
export MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers"
# Using a local path to a huggingface hub model or saved checkpoint:
#export MODEL_NAME="/datasets/models/pipeline"
# Make DEBUG_EXTRA_ARGS empty to disable wandb.
export DEBUG_EXTRA_ARGS="--report_to=tensorboard"
export TRACKER_PROJECT_NAME="sd3-dummy-training"
export TRACKER_RUN_NAME="simpletuner-sdxl"
# Max number of steps OR epochs can be used. Not both.
export MAX_NUM_STEPS=30000
# Will likely overtrain, but that's fine.
export NUM_EPOCHS=0
# A convenient prefix for all of your training paths.
export BASE_DIR="/shared_volume/development/text_to_image/SimpleTuner/outputs"
export DATALOADER_CONFIG="${BASE_DIR}/multidatabackend.json"
export OUTPUT_DIR="${BASE_DIR}/models"
# Set this to "true" to push your model to Hugging Face Hub.
export PUSH_TO_HUB="false"
# If PUSH_TO_HUB and PUSH_CHECKPOINTS are both enabled, every saved checkpoint will be pushed to Hugging Face Hub.
export PUSH_CHECKPOINTS="false"
# This will be the model name for your final hub upload, eg. "yourusername/yourmodelname"
# It defaults to the wandb project name, but you can override this here.
export HUB_MODEL_NAME=$TRACKER_PROJECT_NAME
# By default, images will be resized so their SMALLER EDGE is 1024 pixels, maintaining aspect ratio.
# Setting this value to 768px might result in more reasonable training data sizes for SDXL.
# export RESOLUTION=1024
# export RESOLUTION_TYPE="pixel"
# If you want to have the training data resized by pixel area (Megapixels) rather than edge length,
# set this value to "area" instead of "pixel", and uncomment the next RESOLUTION declaration.
export RESOLUTION=1.0 # 1.0 Megapixel training sizes
export RESOLUTION_TYPE="area"
# If RESOLUTION_TYPE="pixel", the minimum resolution specifies the smaller edge length, measured in pixels. Recommended: 1024.
# If RESOLUTION_TYPE="area", the minimum resolution specifies the total image area, measured in megapixels. Recommended: 1.
export MINIMUM_RESOLUTION=$RESOLUTION
# How many decimals to round aspect buckets to.
#export ASPECT_BUCKET_ROUNDING=2
# Use this to append an instance prompt to each caption, used for adding trigger words.
# This has not been tested in SDXL.
#export INSTANCE_PROMPT="lotr style "
# If you also supply a user prompt library or `--use_prompt_library`, this will be added to those lists.
export VALIDATION_PROMPT="ethnographic photography of teddy bear at a picnic"
export VALIDATION_GUIDANCE=7.5
# You'll want to set this to 0.7 if you are training a terminal SNR model.
export VALIDATION_GUIDANCE_RESCALE=0.0
# How frequently we will save and run a pipeline for validations.
export VALIDATION_STEPS=100
export VALIDATION_NUM_INFERENCE_STEPS=30
export VALIDATION_NEGATIVE_PROMPT="blurry, cropped, ugly"
export VALIDATION_SEED=42
export VALIDATION_RESOLUTION=$RESOLUTION
# Adjust this for your GPU memory size. This, and resolution, are the biggest VRAM killers.
export TRAIN_BATCH_SIZE=8
# Accumulate your update gradient over many steps, to save VRAM while still having higher effective batch size:
# effective batch size = ($TRAIN_BATCH_SIZE * $GRADIENT_ACCUMULATION_STEPS).
export GRADIENT_ACCUMULATION_STEPS=4
# Use any standard scheduler type. constant, polynomial, constant_with_warmup
export LR_SCHEDULE="sine"
# A warmup period allows the model and the EMA weights more importantly to familiarise itself with the current quanta.
# For the cosine or sine type schedules, the warmup period defines the interval between peaks or valleys.
# Use a sine schedule to simulate a warmup period, or a Cosine period to simulate a polynomial start.
#export LR_WARMUP_STEPS=$((MAX_NUM_STEPS / 10))
export LR_WARMUP_STEPS=1000
# Caption dropout probability. Set to 0.1 for 10% of captions dropped out. Set to 0 to disable.
# You may wish to disable dropout if you want to limit your changes strictly to the prompts you show the model.
# You may wish to increase the rate of dropout if you want to more broadly adopt your changes across the model.
export CAPTION_DROPOUT_PROBABILITY=0.1
export METADATA_UPDATE_INTERVAL=65
export VAE_BATCH_SIZE=12
# If this is set, any images that fail to open will be DELETED to avoid re-checking them every time.
export DELETE_ERRORED_IMAGES=0
# If this is set, any images that are too small for the minimum resolution size will be DELETED.
export DELETE_SMALL_IMAGES=0
# Bytedance recommends these be set to "trailing" so that inference and training behave in a more congruent manner.
# To follow the original SDXL training strategy, use "leading" instead, though results are generally worse.
export TRAINING_SCHEDULER_TIMESTEP_SPACING="trailing"
export INFERENCE_SCHEDULER_TIMESTEP_SPACING="trailing"
# Removing this option or unsetting it uses vanilla training. Setting it reweights the loss by the position of the timestep in the noise schedule.
# A value "5" is recommended by the researchers. A value of "20" is the least impact, and "1" is the most impact.
export MIN_SNR_GAMMA=5
# Set this to an explicit value of "false" to disable Xformers. Probably required for AMD users.
export USE_XFORMERS=false
# There's basically no reason to unset this. However, to disable it, use an explicit value of "false".
# This will save a lot of memory consumption when enabled.
export USE_GRADIENT_CHECKPOINTING=false
##
# Options below here may require a bit more complicated configuration, so they are not simple variables.
##
# TF32 is great on Ampere or Ada, not sure about earlier generations.
export ALLOW_TF32=true
# AdamW 8Bit is a robust and lightweight choice. Adafactor might reduce memory consumption, and Dadaptation is slow and experimental.
# AdamW is the default optimizer, but it uses a lot of memory and is slower than AdamW8Bit or Adafactor.
# Choices: adamw, adamw8bit, adafactor, dadaptation
export OPTIMIZER="adamw_bf16"
# EMA is a strong regularisation method that uses a lot of extra VRAM to hold two copies of the weights.
# This is worthwhile on large training runs, but not so much for smaller training runs.
export USE_EMA=false
export EMA_DECAY=0.999
export TRAINER_EXTRA_ARGS=""
## For offset noise training:
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --offset_noise --noise_offset=0.02"
## For terminal SNR training:
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --prediction_type=v_prediction --rescale_betas_zero_snr"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --training_scheduler_timestep_spacing=trailing --inference_scheduler_timestep_spacing=trailing"
## You may benefit from directing training toward a specific weighted subset of timesteps.
# In this example, we train the final 25% of the timestep schedule with a 3x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=later --timestep_bias_portion=0.25 --timestep_bias_multiplier=3"
# In this example, we train the earliest 25% of the timestep schedule with a 5x bias.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=earlier --timestep_bias_portion=0.25 --timestep_bias_multiplier=5"
# Here, we designate that specifically, timesteps 200 to 500 should be prioritised.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --timestep_bias_strategy=range --timestep_bias_begin=200 --timestep_bias_end=500 --timestep_bias_multiplier=3"
## For experimental min-SNR weighted loss training (5 is suggested value by the original researchers):
# Not recommended for terminal SNR models.
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --snr_gamma=5.0"
# For Wasabi S3 filesystem backend (experimental)
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --data_backend=aws --aws_bucket_name=test123"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_endpoint_url=https://s3.wasabisys.com"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_access_key=1234567890"
#export TRAINER_EXTRA_ARGS="${TRAINER_EXTRA_ARGS} --aws_secret_access_key=0987654321"
# Reproducible training. Set to -1 to disable.
export TRAINING_SEED=42
# Mixed precision is the best. You honestly might need to YOLO it in fp16 mode for Google Colab type setups.
export MIXED_PRECISION="bf16" # Might not be supported on all GPUs. fp32 will be needed for others.
export PURE_BF16=true
# This has to be changed if you're training with multiple GPUs.
export TRAINING_NUM_PROCESSES=2
export TRAINING_NUM_MACHINES=1
export ACCELERATE_EXTRA_ARGS="--multi_gpu" # --multi_gpu or other similar flags for huggingface accelerate
# With Pytorch 2.1, you might have pretty good luck here.
# If you're using aspect bucketing however, each resolution change will recompile. Seriously, just don't do it.
# Well, then again... Pytorch 2.2 has support for dynamic shapes. Why not?
export TRAINING_DYNAMO_BACKEND='no' # or 'no' if you want to disable torch compile in case of performance issues or lack of support (eg. AMD)
export TOKENIZERS_PARALLELISM=false
# For more debugging info
export SIMPLETUNER_LOG_LEVEL=DEBUG
a kind of hurricane in my area is disrupting comms. so i might be in-and-out.
but i've pushed an option to use GPU seeds by default with CPU seeds as opt-in. is it easy to run this test or does it take a while?
@bghira No, it's easy. I will test your latest commit
@bghira The new error:
024-06-18 19:59:59,763 [ERROR] (helpers.training.validation) Error generating validation image: Cannot generate a cpu tensor from a generator of type cuda., Traceback (most recent call last): | 0/2 [00:00<?, ?it/s]
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 727, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 902, in __call__
latents = self.prepare_latents(
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 677, in prepare_latents
latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/utils/torch_utils.py", line 67, in randn_tensor
raise ValueError(f"Cannot generate a {device} tensor from a generator of type {gen_device_type}.")
ValueError: Cannot generate a cpu tensor from a generator of type cuda.
2024-06-18 19:59:59,774 [ERROR] (helpers.training.validation) Error generating validation image: Cannot generate a cpu tensor from a generator of type cuda., Traceback (most recent call last):
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/training/validation.py", line 727, in validate_prompt
validation_image_results = self.pipeline(**pipeline_kwargs).images
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 902, in __call__
latents = self.prepare_latents(
^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/helpers/sd3/pipeline.py", line 677, in prepare_latents
latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/shared_volume/development/text_to_image/SimpleTuner/.venv/lib/python3.11/site-packages/diffusers/utils/torch_utils.py", line 67, in randn_tensor
raise ValueError(f"Cannot generate a {device} tensor from a generator of type {gen_device_type}.")
ValueError: Cannot generate a cpu tensor from a generator of type cuda.
huh.. everything should already be on the GPU. its frustrating that i cannot reproduce this issue locally, but i will take a look
so i think this is the same error manifested in MPS form:
latents = torch.randn(shape, generator=generator, device=rand_device, dtype=dtype, layout=layout).to(device)
RuntimeError: Placeholder storage has not been allocated on MPS device!
well, i was able to reproduce the issue. and then, the 2nd launch, it works! i made no changes in-between. i freshly recreated the text cache, and it doesn't bring the issue back.
I will delete the chaches and try again
@bghira It worked.
Thank you
This is the full error