How to convert diffusion vae.encode into ait model?

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Apache License 2.0

4.54k stars 363 forks source link

How to convert diffusion vae.encode into ait model? #140

Open xikakera opened 1 year ago

xikakera commented 1 year ago

Hi,

I use the diffusion depth2img model, which needs to use vae. encode.

So I want to convert diffusion vae.encode into ait model.

However, the torch.nn.functional.pad function is required in Downsample2D.

I didn't find a similar function in ait ops.

add code before examples/05_stable_diffusion/modeling/resnet.py#L116

pad = (0, 0, 0, 1, 0, 1) # ait use [n,h,w,c] not torch [n,c,h,w]
hidden_states = torch.nn.functional.pad(hidden_states, pad, mode="constant", value=0)

How should I use the functions of ait to complete this task？

Thank you.

terrychenism commented 1 year ago

Are you sure PT used padding during inference? i think for size 512/768 padding is not needed.

xikakera commented 1 year ago

output shape in modeling/resnet.py#L116

start downsample2d [1, 1024, 768, 128]  # need change to [1, 1025, 769, 128]
end downsample2d [1, 511, 383, 128]  # need [1, 512, 384, 128]

start downsample2d [1, 511, 383, 256]
end downsample2d [1, 255, 191, 256]  # need [1, 256, 192, 128]

start downsample2d [1, 255, 191, 512]
end downsample2d [1, 127, 95, 512]    # need [1, 128, 96, 128]

The first execution of Downsample2D:

Input [1, 1024, 768, 128] Output [1, 511, 383, 128]

The height and width are 1 less

At the last execution, the output [1, 127, 95, 512] does not match the output of the next step [1, 128, 96, 512]

In nn.Conv2dBias kernel_size=3,stride=2

It needs nn.pad to change the input [1, 1024, 768, 128] to [1, 1025, 769, 128]

The code in the file modeling/resnet.py#L98

            # stride = 2
            conv = nn.Conv2dBias(
                self.channels, self.out_channels, 3, stride=stride, padding=padding
            )

So I think it is necessary to add padding.

Thank you.

yh8899 commented 1 year ago

@xikakera Do you have solved? I have the same issuse.

zy30106 commented 9 months ago

Is there any way to pad tensor like torch to solve this problem?

arthur-71 commented 8 months ago

^bump. Also having this issue.

arthur-71 commented 8 months ago

For anybody interested in a workaround for this issue, I did it by imitating the padding operation using 2 concatenations:

Store two tensors filled with zeros that will be concatenated to the feature map during the inference: from aitemplate.compiler import ops ... # if the feature map to pad is of size (n, h, w, c) zeros_for_h = ops.full()(shape=[n, 1, w, c], fill_value=0., dtype=dtype) zeros_for_w = ops.full()(shape=[n, h + 1, 1, c], fill_value=0., dtype=dtype)
Concatentate during inference: x = ops.concatenate()([x, zeros_for_h], dim=1) x = ops.concatenate()([x, zeros_for_w], dim=2)

This may not be most correct way of doing this but it worked for me.