Open phalexo opened 1 year ago
Bro. These are raw models. Finetuning is needed. For raw output. I considered that pretty amazing. I do wonder why they dind't use Flan-T5 instead, these wouldn't be issues, even with the raw model. It would follow instructions to a tee. I think finetuning on Flan-T5 should be added to the research list.
This: https://github.com/deep-floyd/IF/discussions/89
Also the following
| aspect: | 2:3 | prompt: | 3/4 portrait, resin bust, Mad Joker model, clear irises, 3d, pixar style, ultra perfect composition, liquid, detailing fluid, acrylic,85mm, photoreal | negative prompt: | logo, text, watermark, word, signature, label, sign, meme | style: | in style of Bill Sienkiewicz | seed: | 3345106520 | if_I guidance: | 14.0 (watermarking was turned off for this one)
| aspect: | 3:2 | prompt: | a black and white medium format 85mm portrait of a kitten wearing a tuxedo on his way to a funeral, the image is high quality and highly detailed with the kitten's features clearly visible, photographer Edward Weston used Agfa Isopan ISO 25 film to create this image, which resembles Edward Weston's photograph Pepper No. 35 | seed: | 404353238 | if_I guidance: | 7.0
admittedly the kitten above is a teensy bit funky, physically, but I spent no time trying to optimize the prompt - it's one I crafted for SD 1.0
There's nothing wrong with the model
Not quite the crisp image above.
This kind of looks ok.
@phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using?
[1] This is not meant as some kind of flex, but rather to point out that I'm essentially making no quality compromises with a setup like this.
@phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using?
[1] This is not meant as some kind of flex, but rather to point out that I'm essentially making no quality compromises with a setup like this.
I have spread the model over 3 GPUs, Titan X with 12.3GiB each. I did have to set the type to float16 for T5, otherwise it causes a runtime cuBLAS error.
T5 is about 11.6GiB if_I is about 9.2GiB if_II + if_III is about 5.8GiB
I am generating a single image, not sure why two busts come out.
torch==2.0..0+cu118
if_I = IFStageI('IF-I-XL-v1.0', device='cuda:1')
if_II = IFStageII('IF-II-L-v1.0', device='cuda:2')
if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2')
t5 = T5Embedder(device='cuda:0', torch_dtype=torch.float16)
@phalexo;
not sure why two busts come out
aspect='2:3'
is super important
If you give it a wide aspect (3:2) the model will fill the space with something - usually a duplicate
Much better. The aspect ratio was affecting quality.
what's its prompt? Thanks.
The prompt "HD realistic photo of a baby chimp with a boy."
In general I am not able to get any good text in images either.