drboog / Lafite

Code for paper LAFITE: Towards Language-Free Training for Text-to-Image Generation (CVPR 2022)
MIT License
180 stars 25 forks source link

Custom Dataset training #34

Closed touQerabaS closed 1 year ago

touQerabaS commented 1 year ago

Hi, Would you leave a detailed comment on custom dataset training.

Truly thankfull

drboog commented 1 year ago

I think it depends on what your goal is.

Assume you want to train a model on some image-only dataset collected by yourself.

If you have a fixed testing text distribution to evaluate the model by some metrics such as FID and IS, and you want to use GAN or do not have large enough GPU memory for diffusion model, I would suggest you first train a Lafite model with pseudo image-text pairs, prepared by ./dataset_tool.py, then fine-tune it on image-text pairs where text are generated by models like BLIP 2. According to my experience, this two stage training leads to better results than only training on pseudo image-text pairs or only BLIP 2 generated image-text pairs. As for tuning the hyper-parameters, you can start with a large --gamma, and small --itd --itc, then gradually try smaller --gamma and larger --itd --itc, until you find a good results.

If you don't have a fixed testing text distribution, just want to generate some images similar to the training data, with arbitrary text input. I would suggest you fine-tune a diffusion model, using methods like Dreambooth, or Shifted Diffusion , or directly fine-tune with image-text pairs obtained using BLIP 2.

touQerabaS commented 1 year ago

I am runing following command python train.py --outdir=./outputs/training-runs --data=./datasets/some_dataset.zip --gpus=1 --mirror=1 --aug=noaug --test_data =./datasets/some_dataset.zip --cond=True

Traceback (most recent call last): File "train.py", line 636, in main() # pylint: disable=no-value-for-parameter File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\click\core.py", line 1130, in call return self.main(args, kwargs) File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\click\core.py", line 760, in invoke return __callback(args, kwargs) File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\click\decorators.py", line 26, in new_func return f(get_current_context(), args, kwargs) File "train.py", line 629, in main subprocess_fn(rank=0, args=args, temp_dir=temp_dir) File "train.py", line 460, in subprocess_fn training_loop.training_loop(rank=rank, args) File "C:\Users\touQeer aBBaS\Documents\Lafite\training\training_loop.py", line 182, in training_loop misc.print_module_summary(D, [img, c, fts]) File "C:\Users\touQeer aBBaS\Documents\Lafite\torch_utils\misc.py", line 205, in print_module_summary outputs = module(inputs) File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(*input, *kwargs) File "C:\Users\touQeer aBBaS\Documents\Lafite\training\networks.py", line 1071, in forward x, d_fts = self.b4(x, img, cmap, fts=fts, use_norm=self.use_norm) File "C:\Users\touQeer aBBaS\anaconda3\envs\lafite\lib\site-packages\torch\nn\modules\module.py", line 1212, in _call_impl result = forward_call(input, kwargs) File "C:\Users\touQeer aBBaS\Documents\Lafite\training\networks.py", line 1004, in forward misc.assert_shape(cmap, [None, self.cmap_dim]) File "C:\Users\touQeer aBBaS\Documents\Lafite\torch_utils\misc.py", line 74, in assert_shape if tensor.ndim != len(ref_shape): AttributeError: 'NoneType' object has no attribute 'ndim'

drboog commented 1 year ago

In our text-to-image generation experiment, we didn't use --cond=True. Because you are using --cond=True, then please revise you code accordingly.

touQerabaS commented 1 year ago

Thank you for going above and beyond! @drboog

but there are mentioned in train.py

cond = None, # Train conditional model based on dataset labels: , default = False

also in dataset_tool.py ; it will automatically create label if dataset.json file exits in training and test folder.

drboog commented 1 year ago

The code is based on StyleGAN2-ADA, there are something from the original codebase that I didn't delete.

This method is for text-to-image generation, what you really need in this problem is text condition, not labels. If you want a GAN model conditioned on label, then I suggest you use original StyleGAN2-ADA code. If you want a GAN model conditioned on text, I suggest you follow the example I gave. If you want generation conditioned on both text and label, then revise your code accordingly to satisfy your requirements, our current code does not support that.

What is your goal, training on a customized dataset with image-caption-label triplet?

touQerabaS commented 1 year ago

Noted, @drboog My goal is train on custom dataset (text -images).

dataset.json

{ "labels": [ [ "0001/00001.png", 1 ], .............. ''''''''''''' [ "0001/00002.png", 10 ] ] }

directory structure like this:

train/ class1/ image1.jpg image2.jpg ... class2/ image1.jpg image2.jpg ... ...

I am confuse how to train text to images model on custom dataset.

I appreciate your support.

drboog commented 1 year ago

As I showed in the example in readme.md .

First of all, you need to prepare a dataset with image and text files, i.e. a folder contains paired files just like 1.jpg, 1.txt, 2.jpg,. 2.txt, ...

Then use dataset_tool.py to process the files. Image and text semantic information will be saved in 'clip_img_features' and 'clip_txt_features'.

Then use command we provided to train the model with the dataset.

Is there a specific reason that you use a different structure instead of the one we gave in readme.md? If that reason is something inevitable, then you should write your own dataset_tool.py and train.py.

touQerabaS commented 1 year ago

Oh, I gotchu, You're great at figuring things out!

It means no need for labels, "clip_txt_features" are used as a labels.