Open bes-dev opened 1 year ago
If I try to use the same script but with FC model from AITemplate docs, it works well:
import logging
from aitemplate.compiler import ops
from aitemplate.frontend import nn
# AIT utils
from aitemplate.compiler import compile_model
from aitemplate.frontend import Tensor
from aitemplate.testing import detect_target
class AITSimpleModel(nn.Module):
def __init__(self, hidden, eps: float = 1e-5):
super().__init__()
self.dense1 = nn.Linear(hidden, 4 * hidden, specialization="fast_gelu")
self.dense2 = nn.Linear(4 * hidden, hidden)
self.layernorm = nn.LayerNorm(hidden, eps=eps)
def forward(self, input):
hidden_states = self.dense1(input)
hidden_states = self.dense2(hidden_states)
hidden_states = hidden_states + input
hidden_states = self.layernorm(hidden_states)
return hidden_states
def mark_output(y):
if type(y) is not tuple:
y = (y,)
for i in range(len(y)):
y[i]._attrs["is_output"] = True
y[i]._attrs["name"] = "output_%d" % (i)
y_shape = [d._attrs["values"][0] for d in y[i]._attrs["shape"]]
print("AIT output_{} shape: {}".format(i, y_shape))
def compile_moc(
batch_size,
input_size,
c_in = 3,
c_out = 8,
use_fp16_acc = False,
convert_conv_to_gemm = False,
output_dir="./tmp/"
):
ait_model = AITSimpleModel(32)
ait_input = Tensor(
shape=[1, 32],
name="input0",
is_input=True,
)
ait_model.name_parameter_tensor()
ait_out = ait_model(ait_input)
mark_output(ait_out)
target = detect_target(
use_fp16_acc=use_fp16_acc,
convert_conv_to_gemm=convert_conv_to_gemm
)
compile_model(
ait_out,
target,
output_dir,
"moc"
)
def main():
logging.getLogger().setLevel(logging.INFO)
logger = logging.getLogger()
logger.info("Compile model...")
compile_moc(
batch_size=1,
input_size=256,
use_fp16_acc=True,
convert_conv_to_gemm=True,
)
if __name__ == "__main__":
main()
Omg, I investigated this issue. So, the problem is related to the input shape of the tensor. Here we try to find the cuda kernel for Conv2d operation that corresponds to input/output channels. If there is not a suitable kernel for our number of channels (for example, small convolution with c_in=3, c_out=8), this function returns an empty op_instance array. So, we haven't any kernel that we can apply to our layer, compilation process fails. It looks like a bug of the AITemplate, because we need default implementation for convolution kernel!
Thanks for reporting! @aakhundov can you please help take a look when you get a chance? (you added conv_common.py, and apologize if tagging the wrong person)
@bes-dev Thank you for reporting and investigating the issue! I've reproduced it, and can confirm that your initial example with nn.Conv2dBias
indeed results in that particular error. We'll look further into this and provide more details.
import logging from aitemplate.compiler import ops from aitemplate.frontend import nn # AIT utils from aitemplate.compiler import compile_model from aitemplate.frontend import Tensor from aitemplate.testing import detect_target class MOCModel(nn.Module): def __init__(self, c_in=3, c_out=8): super().__init__() self.conv = nn.Conv2dBias(c_in, c_out, 3, 1, 1) def forward(self, x): x = self.conv(x) return x
I found that it can be rewritten something like this:
from aitemplate.frontend import nn
class MOCModel(nn.Module):
def __init__(self, c_in=3, c_out=8):
super().__init__()
self.conv = nn.Conv2dBiasFewChannels(c_in, c_out, 3, 1, 1)
def forward(self, x):
x = self.conv(x)
return x
model = MOCModel(3, 64)
It works for me, but the weight tensor of model.conv has unexpected shape [64, 3, 3, 4] instead of [64, 3, 3, 3] and we need to pad weights of source model during weight mapping.
@bes-dev So the story goes like this. By default, AIT assumes c_in
of 4 or 8 to rely on the higher-performing configuration of the kernels backing the conv ops' implementation.
As you've noticed, it is possible to use c_in=3
with nn.Conv2dBiasFewChannels
, as it uses common_conv2d_few_channels
op under the hood that sets specific kernel configuration for ch_in=3
here.
However, as you've also noticed, by default even nn.Conv2dBiasFewChannels
pads the weights to 4 channels (here): again, to improve performance. If you want to avoid that, you can set auto_padding=False
in the nn.Conv2dBiasFewChannels
constructor. The padding then won't happen and you'll have [64, 3, 3, 3]
-shaped weights, but it comes at the cost of (potentially) worse performance.
Are there any solutions being developed for this at the moment? I am trying to optimize stable diffusion inpainting with AIT but the input channel is 9. I am getting the same StopIteration issue.
^ bump, have also came across this issue. I ended up padding the input and weights so that a suitable kernel could be found.
examples/08_esrgan same situation
$ python compile.py --model-path RealESRGAN_x4plus.pth
File "/media//8t/Workspace/study/AITemplate/python/aitemplate/backend/cuda/conv2d/common.py", line 805, in gen_function
emitted_instance = f_emit_instance(op_instance[value])
KeyError: ''
examples/01_resnet-50
$ python infer_with_torch.py
File "/media//8t/Workspace/study/AITemplate/python/aitemplate/backend/cuda/conv2d/common.py", line 805, in gen_function
emitted_instance = f_emit_instance(op_instance[value])
KeyError: ''
examples/01_resnet-50
$ python benchmark_ait.py
File "/media/8t/Workspace/study/AITemplate/python/aitemplate/backend/cuda/conv2d/common.py", line 805, in gen_function
emitted_instance = f_emit_instance(op_instance[value])
KeyError: ''
When I try to compile simple convolution network, compilation process crash because conv2d.attrs_['op_instance'] is empty for the convolution layer. How can I fix it?
Behaviour can be reproduced by this script:
Output:
conv2d.attrs_ state: