Closed touQerabaS closed 1 year ago
I think it depends on what your goal is.
Assume you want to train a model on some image-only dataset collected by yourself.
If you have a fixed testing text distribution to evaluate the model by some metrics such as FID and IS, and you want to use GAN or do not have large enough GPU memory for diffusion model, I would suggest you first train a Lafite model with pseudo image-text pairs, prepared by ./dataset_tool.py, then fine-tune it on image-text pairs where text are generated by models like BLIP 2. According to my experience, this two stage training leads to better results than only training on pseudo image-text pairs or only BLIP 2 generated image-text pairs. As for tuning the hyper-parameters, you can start with a large --gamma, and small --itd --itc, then gradually try smaller --gamma and larger --itd --itc, until you find a good results.
If you don't have a fixed testing text distribution, just want to generate some images similar to the training data, with arbitrary text input. I would suggest you fine-tune a diffusion model, using methods like Dreambooth, or Shifted Diffusion , or directly fine-tune with image-text pairs obtained using BLIP 2.
I am runing following command python train.py --outdir=./outputs/training-runs --data=./datasets/some_dataset.zip --gpus=1 --mirror=1 --aug=noaug --test_data =./datasets/some_dataset.zip --cond=True
Traceback (most recent call last):
File "train.py", line 636, in
In our text-to-image generation experiment, we didn't use --cond=True. Because you are using --cond=True, then please revise you code accordingly.
Thank you for going above and beyond! @drboog
but there are mentioned in train.py
cond = None, # Train conditional model based on dataset labels:
also in dataset_tool.py ; it will automatically create label if dataset.json file exits in training and test folder.
The code is based on StyleGAN2-ADA, there are something from the original codebase that I didn't delete.
This method is for text-to-image generation, what you really need in this problem is text condition, not labels. If you want a GAN model conditioned on label, then I suggest you use original StyleGAN2-ADA code. If you want a GAN model conditioned on text, I suggest you follow the example I gave. If you want generation conditioned on both text and label, then revise your code accordingly to satisfy your requirements, our current code does not support that.
What is your goal, training on a customized dataset with image-caption-label triplet?
Noted, @drboog My goal is train on custom dataset (text -images).
dataset.json
{ "labels": [ [ "0001/00001.png", 1 ], .............. ''''''''''''' [ "0001/00002.png", 10 ] ] }
directory structure like this:
train/ class1/ image1.jpg image2.jpg ... class2/ image1.jpg image2.jpg ... ...
I am confuse how to train text to images model on custom dataset.
I appreciate your support.
As I showed in the example in readme.md .
First of all, you need to prepare a dataset with image and text files, i.e. a folder contains paired files just like 1.jpg, 1.txt, 2.jpg,. 2.txt, ...
Then use dataset_tool.py to process the files. Image and text semantic information will be saved in 'clip_img_features' and 'clip_txt_features'.
Then use command we provided to train the model with the dataset.
Is there a specific reason that you use a different structure instead of the one we gave in readme.md? If that reason is something inevitable, then you should write your own dataset_tool.py and train.py.
Oh, I gotchu, You're great at figuring things out!
It means no need for labels, "clip_txt_features" are used as a labels.
Hi, Would you leave a detailed comment on custom dataset training.
Truly thankfull