Training with custom dataset

abhigoku10 commented 11 months ago

@ArcherFMY thanks a lot for sharing the code base , just had couple of queries

is the training pipeline shared ? if not can u please share the training pipeline
can we train with the custom dataset for different application ? if so what is the quantity of the data required for getting decent output
can the generated images be more photorealistic ? if so what had to be done Thanks in advvance

ArcherFMY commented 11 months ago

@ArcherFMY thanks a lot for sharing the code base , just had couple of queries

is the training pipeline shared ? if not can u please share the training pipeline

can we train with the custom dataset for different application ? if so what is the quantity of the data required for getting decent output

can the generated images be more photorealistic ? if so what had to be done Thanks in advvance

Hi,

We just use the dreambooth training scripts in diffusers (https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py) to train the model. You can simply follow the instructions in that repo to train on your custom dataset.

The 'instance_prompt' we used is '<360panorama>' as can be found here (https://github.com/ArcherFMY/SD-T2I-360PanoImage/blob/main/txt2panoimg/text_to_360panorama_image_pipeline.py#L150). The image resolution while training can be set to w=1024 and h=512 (just resize). We use one A100, and we set 'train_batch_size=8' and 'learning_rate=1e-6'. 20,000 to 30,000 steps would be ok.

According to our experience, for high-quality image generation, the quality of the training images is more important than the quantity. For the text-to-360panoimage, the training dataset we use contains about 2000 images (we use data augmentation, such as gradually stitching the right part to the left part, and finally get 20,000 images). All images are 4k resolution and are carefully picked by removing complex scenarios that have complex textures.

The results generated by our base model are close to photorealistic. The final high-resolution images are a little artificial. This is mainly influenced by the GAN model (RealESRGAN). You can try some other super-resolution models or try the image-to-image-with-controlnet to generate images again, using other styled base models.

abhigoku10 commented 9 months ago

@ArcherFMY thanks for response, i was able to train the repo for the text2panroma application, what is the training process for image2 panorama.. can you please share the steps

ArcherFMY / SD-T2I-360PanoImage

Training with custom dataset #4