FLUX finetuning speedup

Just looking for other users (primarily docker) to confirm these changes.

Environment: Docker on linux, Cuda 12.4 on base OS. finetuning 1152 px images went from 10.6s/it to 6.4s/it by bumping versions to more closely match sd-scripts current versions. This also resulted in an almost 11GB drop in VRAM usage (I believe during the backwards pass). I changed no settings in-between the tests of s/it and VRAM.

changes: torch to 2.4.0 torchvision to 0.19.0 xformers to whatever is more recent and compatible. In my case it was 0.0.27.post2 auto selected. transformers to 4.36.2

command to run in docker container: pip install -U --extra-index-url https://download.pytorch.org/whl/cu121 --extra-index-url https://pypi.nvidia.com torch==2.4.0 torchvision==0.19.0 xformers transformers==4.36.2

Just wanted to see if it worked similarly for others without causing problems before submitting a PR to bump the version in linux-docker and Dockerfile.

bmaltais / kohya_ss

FLUX finetuning speedup #2942