AmericanPresidentJimmyCarter / simple-flux-lora-training

MIT License
65 stars 6 forks source link

guidelines for 2 minute LoRA training run to mirror Fal's fast flux trainer #5

Open bghira opened 1 month ago

bghira commented 1 month ago

hey Jim, thanks for the work on the guide.

I wanted to contribute some info here for anyone wanting to train a potato LoRA like Fal offers.

Your config.json needs the following values set:

The following values get added into config.env:

export TRAINING_NUM_PROCESSES=8
# Uncomment this if you want to use torch.compile for more speedup if you intend on training much longer than 2 minutes.
# Compiling takes a good 5-10 minutes depending on the system and the chosen flags, so if your training run is shorter,
# it does not make sense to enable torch compile.
#export TRAINING_DYNAMO_BACKEND=inductor

For multidatabackend.json, we just use a single square-cropped 512x512 dataset.

This configuration results in 1 iteration per second with about 2 minutes for 100 steps (incl ~20-30 second startup time)

bghira commented 1 month ago

resulting LoRA example: https://huggingface.co/bghira/flux-beavisandbutthead-mgpu