Closed stone002 closed 1 month ago
hi, I have a question, when training sd3 flash, how should I create data, is the image in json the name of the image? It was also strange that I didn't have a model saved during my training
hi, I have a question, when training sd3 flash, how should I create data, is the image in json the name of the image? It was also strange that I didn't have a model saved during my training
I only test sdxl, Flash Diffusion used 'webdataset' for training. I created my test dataset as the format, one image with a same name json file, the json looks like:
{ "jpg":img_name+".jpg", "json":{ "caption":caption, "aesthetic_score": 8.0, } }
then package all images and jsons in a .tar. It works fine.
About the model save, you may check your 'MAX_EPOCHS' and 'CKPT_EVERY_N_STEPS' in your .yaml file, or add 'save_last=True' in Trainner callbacks. Hope it helps!
Finally I found that because my test dataset is not enough. I only create one .tar to test. I tried to get more data and make more .tar files, it works fine on multi gpus.
Thanks for your reply, I continue to try again, but there is still no change.my datasets is one image with a same name json file.
Thanks for your reply, I continue to try again, but there is still no change.my datasets is one image with a same name json file.
Maybe more data is needed I guess. My test .tar file contains 100 images and 100 json files, it works fine on single gpu.
hi,I think it's very strange for me,but I can't figure out why.My json file is as follows: { "jpg": "people_flux_00209.png", "json": { "caption": "A person within an Africa-shaped border with a tattooed arm and sleeveless top.", "aesthetic_score": 8 } } and I packaged the image with the same name into a tar file as a training set with 3000 images.
SHARDS_PATH_OR_URLS:
hi,I think it's very strange for me,but I can't figure out why.My json file is as follows: { "jpg": "people_flux_00209.png", "json": { "caption": "A person within an Africa-shaped border with a tattooed arm and sleeveless top.", "aesthetic_score": 8 } } and I packaged the image with the same name into a tar file as a training set with 3000 images.
Dataset
SHARDS_PATH_OR_URLS:
- pipe:cat /data/sd3/flash_sd3-{000000..000000}.tar But no loss is shown in the training, and no model is saved. My code is as follows
I used sdxl for test. I modified these in my code:
My test dataset is 100 [image, json] pairs, these values works for me but inference effect is not very good, I guess because my data and train epochs are both not enough. You can have a try by setting CKPT_EVERY_N_STEPS smaller to check if it saves .ckpt in training process. Hope it helps
Hi, I meet this problem when I try to use 2 A100 to train.
I used train_flash_sdxl.py, my Trainer params are:
and the flash_sdxl.yaml is
My code stuck here every time.
If I use only one A100, it works fine, but very slow.