Close, but no banana. - Githubissues

phalexo commented 1 year ago

The prompt "HD realistic photo of a baby chimp with a boy."

In general I am not able to get any good text in images either.

Chimps1

darkman111a commented 1 year ago

Bro. These are raw models. Finetuning is needed. For raw output. I considered that pretty amazing. I do wonder why they dind't use Flan-T5 instead, these wouldn't be issues, even with the raw model. It would follow instructions to a tee. I think finetuning on Flan-T5 should be added to the research list.

tildebyte commented 1 year ago

This: https://github.com/deep-floyd/IF/discussions/89

Also the following

joker_bill_sienkiewicz_3

| aspect: | 3:2 | prompt: | a black and white medium format 85mm portrait of a kitten wearing a tuxedo on his way to a funeral, the image is high quality and highly detailed with the kitten's features clearly visible, photographer Edward Weston used Agfa Isopan ISO 25 film to create this image, which resembles Edward Weston's photograph Pepper No. 35 | seed: | 404353238 | if_I guidance: | 7.0

kitten_4

admittedly the kitten above is a teensy bit funky, physically, but I spent no time trying to optimize the prompt - it's one I crafted for SD 1.0

There's nothing wrong with the model

phalexo commented 1 year ago

Not quite the crisp image above.

Joker1

phalexo commented 1 year ago

Joker2

phalexo commented 1 year ago

This kind of looks ok.

Kitty1

tildebyte commented 1 year ago

@phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using?

[1] This is not meant as some kind of flex, but rather to point out that I'm essentially making no quality compromises with a setup like this.

phalexo commented 1 year ago

@phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using?

[1] This is not meant as some kind of flex, but rather to point out that I'm essentially making no quality compromises with a setup like this.

I have spread the model over 3 GPUs, Titan X with 12.3GiB each. I did have to set the type to float16 for T5, otherwise it causes a runtime cuBLAS error.

T5 is about 11.6GiB if_I is about 9.2GiB if_II + if_III is about 5.8GiB

I am generating a single image, not sure why two busts come out.

torch==2.0..0+cu118

phalexo commented 1 year ago

if_I = IFStageI('IF-I-XL-v1.0', device='cuda:1')

if_II = IFStageII('IF-II-L-v1.0', device='cuda:2')

if_III = StableStageIII('stable-diffusion-x4-upscaler', device='cuda:2')

t5 = T5Embedder(device='cuda:0', torch_dtype=torch.float16)

tildebyte commented 1 year ago

@phalexo;