Open xikakera opened 1 year ago
Are you sure PT used padding during inference? i think for size 512/768 padding is not needed.
output shape in modeling/resnet.py#L116
start downsample2d [1, 1024, 768, 128] # need change to [1, 1025, 769, 128]
end downsample2d [1, 511, 383, 128] # need [1, 512, 384, 128]
start downsample2d [1, 511, 383, 256]
end downsample2d [1, 255, 191, 256] # need [1, 256, 192, 128]
start downsample2d [1, 255, 191, 512]
end downsample2d [1, 127, 95, 512] # need [1, 128, 96, 128]
The first execution of Downsample2D:
Input [1, 1024, 768, 128] Output [1, 511, 383, 128]
The height and width are 1 less
At the last execution, the output [1, 127, 95, 512] does not match the output of the next step [1, 128, 96, 512]
In nn.Conv2dBias kernel_size=3,stride=2
It needs nn.pad to change the input [1, 1024, 768, 128] to [1, 1025, 769, 128]
The code in the file modeling/resnet.py#L98
# stride = 2
conv = nn.Conv2dBias(
self.channels, self.out_channels, 3, stride=stride, padding=padding
)
So I think it is necessary to add padding.
Thank you.
@xikakera Do you have solved? I have the same issuse.
Is there any way to pad tensor like torch to solve this problem?
^bump. Also having this issue.
For anybody interested in a workaround for this issue, I did it by imitating the padding operation using 2 concatenations:
Store two tensors filled with zeros that will be concatenated to the feature map during the inference:
from aitemplate.compiler import ops
...
# if the feature map to pad is of size (n, h, w, c)
zeros_for_h = ops.full()(shape=[n, 1, w, c], fill_value=0., dtype=dtype)
zeros_for_w = ops.full()(shape=[n, h + 1, 1, c], fill_value=0., dtype=dtype)
Concatentate during inference:
x = ops.concatenate()([x, zeros_for_h], dim=1)
x = ops.concatenate()([x, zeros_for_w], dim=2)
This may not be most correct way of doing this but it worked for me.
Hi,
I use the diffusion depth2img model, which needs to use vae. encode.
So I want to convert diffusion vae.encode into ait model.
However, the torch.nn.functional.pad function is required in Downsample2D.
I didn't find a similar function in ait ops.
add code before examples/05_stable_diffusion/modeling/resnet.py#L116
How should I use the functions of ait to complete this task?
Thank you.