Open recoilme opened 4 hours ago
Seems my training is under 48GB
bash train_scripts/train.sh configs/sana_config/1024ms/Sana_1600M_img1024.yaml --data.data_dir="[asset/example_data]" --data.type=SanaImgDataset --model.multi_scale=false --data.load_vae_feat=false --train.train_batch_size=1
refer to:
Actually if you switch optimizer type in config file to AdamW
, the GPU memory will be less. We will update a newer Came in the future, which will occupy even less than AdamW.
train:
optimizer:
lr: 1.0e-4
type: AdamW
weight_decay: 0.01
eps: 1.0e-8
betas: [0.9, 0.999]
refer to:
But how you do it? You have 32x VAE vs 8x in SDXL, less model size and need more then 2.5x against SDXL for train in 1024?
The model you are using is 1.6B and all the VAE and Text Encoder are all extracting feature online.
it is all about training scripts
Currently with using Kohya we are able to fully fine to 12 billion parameters FLUX dev in 16 bit precision even on 6 GB GPUs via using block swapping :)
Cool. Hh, I can't imagine the speed. lol : )
i'm very hope what you will add some minimum optimizations in the future.. its dead for full fine tuning for now, A100 is very expensive Thx for reply and good model!
What do you mean it's dead?
Cool. Hh, I can't imagine the speed. lol : )
with latest improvements speeds are really decent
rtx 3090 is 7 second per sample image - batch size 1 rtx 4090 is like 5 second per sample image RTX A6000 is like 6 second per sample image
dead
it's dead for full fine-tunings from GPU poor guys We rent GPU for train. It's very expensive. 48GPU+ for train with batch 1 - its stop factor for most of us We need latent/TE caching and multi aspect ratio and probably slow optimizer like adafactor for train fine details like eyes with low LR
i waiting sana so long for training on potato, but its not working on A40 with 48GPU(