Stability-AI / generative-models

Generative Models by Stability AI
MIT License
24.58k stars 2.73k forks source link

Design Choices #35

Open bonlime opened 1 year ago

bonlime commented 1 year ago

Hi! First of all thanks for releasing such a great model and accompanying paper. Could you clarify few design choices in the SDXL?

  1. Why do you use both previous CLIP-L and new OpenCLIP ViT-bigG? Have you tried only using the later one, wouldn't it be enough?
  2. The crop-conditioning while avoid generating too many cropped images, seems to generate more duplicated cases, where the object of interest is present everywhere, instead of being a single instance. See this comparisons. I wonder why not to use multi-aspect ( aka rectangles) training during all training process, rather than only during fine-tuning.
andreemic commented 1 year ago

Hey the link is gated, can you send the example here directly?

bonlime commented 1 year ago

@andreemic sorry, gave the wrong link. Here is the correct one.

And here are some representative examples of such behaviour

Enchanting waterfall in a lush jungle, surrounded by exotic plants and wildlife, tranquil, serene, high detail, tropical landscape

Glimpses of a herd of wild elephants crossing a savanna, surrounded by tall grass and a brilliant orange sunset, majestic, peaceful, high detail, safari landscape

Breathtaking view of a desert landscape, with towering sand dunes and a brilliant blue sky, serene, vast, high detail, desert landscape

Majestic Machu Picchu, set against a backdrop of towering mountains, breathtaking, high detail, landscape

The stunning Taj Mahal, set against a backdrop of lush greenery, historic, high detail, landmark