Weird Behavior in LORA Training

I’m encountering some strange behavior while training a LoRA model using FluxGym, and I’m curious if anyone else has seen something similar. In my training process, I generated samples at intervals (steps 250, 500, and 750) to check the model’s progression, and I’ve attached an image that illustrates this. Here’s the setup I used for the training prompts:

ADI4 as fireman --d 999

ADI4 as software engineer --d 999

ADI4 as president --d 999

ADI4 as teacher --d 999

At step 500, the outputs generally align with the specific professions in the prompts, showing elements unique to each role, like a suit for "president" or firefighter gear. However, by step 750, things get weird. All the generations start looking more like each other, particularly showing characteristics of the teacher role, especially the traditional clothing. This feels almost like the training is somehow being "overwritten" by previous samples.

Here’s a rough timeline of what I noticed:

Step 250: The generated samples are somewhat unique to each prompt but still rough.

Step 500: Outputs become clearer and some even begin aligning more closely with the trained man character

Step 750: Almost all samples look strikingly similar, with many reflecting the traditional attire seen in the "teacher" role from step 500 sample—even for prompts like "fireman" and "president," which shouldn’t be the case.

Questions: Has anyone experienced this type of "style bleed" before? Could it be that prior generated samples are somehow influencing the current ones?

Is there a known issue in FluxGym where training appears to converge too strongly toward one class or style over iterations?

Any suggestions on preventing this kind of merging of styles as training progresses?

Train Script:

accelerate launch \
--mixed_precision bf16 \
--num_cpu_threads_per_process 1 \
sd-scripts/flux_train_network.py \
--pretrained_model_name_or_path "/home/me/fluxgym/models/unet/flux1-dev.sft" \
--clip_l "/home/me/fluxgym/models/clip/clip_l.safetensors" \
--t5xxl "/home/me/fluxgym/models/clip/t5xxl_fp16.safetensors" \
--ae "/home/me/fluxgym/models/vae/ae.sft" \
--cache_latents_to_disk \
--save_model_as safetensors \
--sdpa --persistent_data_loader_workers \
--max_data_loader_n_workers 2 \
--seed 42 \
--gradient_checkpointing \
--mixed_precision bf16 \
--save_precision bf16 \
--network_module networks.lora_flux \
--network_dim 4 \
--optimizer_type adamw8bit \--sample_prompts="/home/me/fluxgym/outputs/adi4/sample_prompts.txt" --sample_every_n_steps="250" \
--learning_rate 8e-4 \
--cache_text_encoder_outputs \
--cache_text_encoder_outputs_to_disk \
--fp8_base \
--highvram \
--max_train_epochs 16 \
--save_every_n_epochs 4 \
--dataset_config "/home/me/fluxgym/outputs/adi4/dataset.toml" \
--output_dir "/home/me/fluxgym/outputs/adi4" \
--output_name adi4 \
--timestep_sampling shift \
--discrete_flow_shift 3.1582 \
--model_prediction_type raw \
--guidance_scale 1 \
--loss_type l2 \

Train Config:

[general]
shuffle_caption = false
caption_extension = '.txt'
keep_tokens = 1

[[datasets]]
resolution = 512
batch_size = 1
keep_tokens = 1
[[datasets.subsets]]
image_dir = '/home/me/fluxgym/datasets/adi4'
class_tokens = 'ADI4'
num_repeats = 10

cocktailpeanut / fluxgym

Weird Behavior in LORA Training #226