facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Apache License 2.0
4.56k stars 371 forks source link

[bug] Stable diffusion aitemplate img2img precision error #141

Open githubofhuo opened 1 year ago

githubofhuo commented 1 year ago

I use aitmplate for stable diffusion inference and complie success, but img2img result is different from pytorch diffusers img2img result. Two results are basically the same, but there are differences in the details and background of the generated characters, and I have same input arguments.

I checked the intermediate result and found that the output of clip text encoder has diff in the thousandth (eps=1e-3).

The following are clip text encoder outputs, both seqlen is truncate to 64, (pt for pytorch diffusers StableDiffusionImg2ImgPipeline, ait for AITemplate StableDiffusionImg2ImgAITPipeline)

[ait] text_embeddings: tensor([[[-0.3882,  0.0236, -0.0544,  ..., -0.4902, -0.3057,  0.0645],
         [ 0.7251, -0.5430,  1.2314,  ..., -1.0742,  0.4631, -0.2561],
         [ 1.2705, -0.1815,  2.2695,  ..., -0.9531, -0.0946, -0.4238],
         ...,
         [-0.0774, -0.3027,  0.3289,  ..., -1.2979, -0.4155, -1.0732],
         [-0.2668, -0.6143,  0.0475,  ..., -1.8457, -1.4912, -1.2803],
         [-1.3701,  0.7026, -0.1947,  ..., -1.6514, -0.4041, -0.9980]],

        [[-0.3882,  0.0236, -0.0544,  ..., -0.4902, -0.3057,  0.0645],
         [ 0.1016, -1.7275, -1.0322,  ...,  0.5332,  0.1890,  1.5557],
         [ 1.1211, -0.3545,  2.3242,  ..., -0.1503, -0.3704, -0.3162],
         ...,
         [ 0.3638, -1.5352, -1.1416,  ..., -1.6572, -0.4778, -0.3823],
         [ 0.3875, -1.5186, -1.1230,  ..., -1.6631, -0.4534, -0.3904],
         [ 0.3984, -1.5137, -1.0947,  ..., -1.6689, -0.4341, -0.3977]]],
       device='cuda:0')
[pt] text_embeddings: tensor([[[-0.3880,  0.0236, -0.0544,  ..., -0.4902, -0.3056,  0.0646],
         [ 0.7300, -0.5390,  1.2256,  ..., -1.0804,  0.4697, -0.2576],
         [ 1.2857, -0.1844,  2.2731,  ..., -0.9583, -0.0933, -0.4245],
         ...,
         [-0.0750, -0.3083,  0.3307,  ..., -1.2873, -0.3960, -1.0716],
         [-0.2637, -0.6062,  0.0445,  ..., -1.8592, -1.4758, -1.2841],
         [-1.3688,  0.7061, -0.1931,  ..., -1.6505, -0.4107, -0.9954]],

        [[-0.3880,  0.0236, -0.0544,  ..., -0.4902, -0.3056,  0.0646],
         [ 0.0999, -1.7271, -1.0309,  ...,  0.5268,  0.1912,  1.5578],
         [ 1.1199, -0.3548,  2.3305,  ..., -0.1505, -0.3687, -0.3150],
         ...,
         [ 0.3621, -1.5352, -1.1307,  ..., -1.6581, -0.4602, -0.3935],
         [ 0.3851, -1.5184, -1.1140,  ..., -1.6681, -0.4379, -0.4030],
         [ 0.3951, -1.5115, -1.0848,  ..., -1.6736, -0.4176, -0.4087]]],
       device='cuda:0')

The following are the specific parameters:

model_id="Linaqruf/anything-v3.0"  # https://huggingface.co/Linaqruf/anything-v3.0
width=512
height=512
prompt="high quality,1girl, ((rabbit)), ((stickers of rabbit)), rabbit on clothes, rabbit on table, cute rabbit on the wall,((pixel art))"
negative_prompt="nsfw,deformed,bad anatomy,disfigured,mutated,ugly,messy drawing,poorly drawn,error,blurred,lowres,bad proportions,bad shadow,jpeg artifacts,worst quality, low quality,mosaic,messy drawing,extra digit, fewer digits, cropped,poorly drawn face, dirty face,poorly drawn eyes,extra eyes,fused mouth, poorly drawn mouth,poorly drawn ears, extra ears,missing ears,dirty teeth, liquid teeth,bad tongue,liquid tongue,black tongue,bad hands,poorly drawn hands,extra hands, bad hands, poorly drawn hands, fused fingers, mutated hands and fingers, malformed hands, missing fingers, one hand with more than 5 fingers, one hand with less than 5 fingers, more than 1 left hand, more than 1 right hand"

And the results of img2img pt: pt_512_512 ait: ait_512_512

Does anyone have idea to fix precision error and make ait reuslt equal to pytorch diffusers results? Thanks

Sanster commented 1 year ago

@githubofhuo hi, I am also testing img2img with AITemplate(with RTX3090), in my test, with same configs(steps, sampler, strength...), when compile a model with 640x640 model work fine, but in some resolutions (e.g: 768x768, 1024*1024), the model will randomly generate a black image, have you ever encountered this problem? Thanks

update:

I am using the following code to check the output of each component(clip, unet, vae), when generate image is black vae output returns False

def isfinite(t: torch.Tensor) -> bool:
    return torch.isfinite(t).all()
githubofhuo commented 1 year ago

@githubofhuo hi, I am also testing img2img with AITemplate(with RTX3090), in my test, with same configs(steps, sampler, strength...), when compile a model with 640x640 model work fine, but in some resolutions (e.g: 768x768, 1024*1024), the model will randomly generate a black image, have you ever encountered this problem? Thanks

update:

I am using the following code to check the output of each component(clip, unet, vae), when generate image is black vae output returns False

def isfinite(t: torch.Tensor) -> bool:
    return torch.isfinite(t).all()

I haven't met black image. I only run on A100, AITemplate v0.1.1. with size 512512, 768768 and 1024*1024. Now the result is acceptable, although there are few differences between the results of diffusers and ait. Dont use eular a, use DDIM or PNMD works well.

for repeatable results, please check seed of the generator I use this config:

def set_seed(seed=0):
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    np.random.seed(seed)
    random.seed(seed)
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":16:8"
    torch.use_deterministic_algorithms(True)
Sanster commented 1 year ago

update: fixed by applying the following operations

khabinov commented 1 year ago

I see that @Sanster found a workaround for their issue, however, @githubofhuo, is your issue still there?

CanyonWind commented 1 year ago

@githubofhuo does the discrepancy come from the precision casting? fp16 AIT is supposed to have some nuanced differences compared to the ones generated from PyTorch fp32.

githubofhuo commented 1 year ago

I see that @Sanster found a workaround for their issue, however, @githubofhuo, is your issue still there?

thx, output image now is almost identical to diffusers result, although intermediate results have little discrepancy.

CanyonWind commented 1 year ago

thx, output image now is almost identical to diffusers result, although intermediate results have little discrepancy.

Hi, I met a similar issue and couldn't find out why. Could you please educate me how did you manage to reduce the discrepancy? Thanks ahead!

alvinsay commented 1 year ago

@githubofhuo The same here. How do you reduce the discrepancy?

Boom-Hacker commented 1 year ago

example_ait

Boom-Hacker commented 1 year ago

python3 ./scripts/demo.py --local-dir /root/AT3P-diffusers/ --prompt "girl,skirt,full body"

Boom-Hacker commented 1 year ago

why?i convert the ckpt to diffusers