BiasAdd:FusedConv:Failed to allocate memory

baicaiPCX commented 2 years ago

Hello Running GetOutputInfo funtion of rml::Model will throw a error, when onnx model has more than three ConvTranspose2d. My test model structure is very simple and consists of multiple blocks, which block consists of a ConvTranspose2d, a Conv2d and ReLU. This is my test code: rml::Context context=rml::CreateDefaultContext(); std::wstring model_path("decoder.onnx"); rml::Graph graph=rml::LoadGraphFromFile(model_path); rml::Model model=context.CreateModel(graph); rml_tensor_info output_info=model.GetOutputInfo();

The log of running is: INFO: rmlCreateDefaultContext(params=(device_idx:1), context=00000004DD78FF3F8) INFO: Using D3D12 device: AMD Radeon RX 6500M INFO: Model info: domain: ir_version:6 producer_name: pytorch producer_version: 1.9 version: 0 description: opset domain: opset version11 ERROR: output/BiasAdd:FusedConv: Failed to allocate memory, size: 17179934720, pool size: 327680

My test model is created by pytorch 1.9, and then converted to onnx. The pytorch code show as below:

class DecoderBlock(nn.Module):
    def __init__(self,input_channels,output_channels):
        super(DecoderBlock,self).__init__()
        self.upsample=nn.ConvTranspose2d(input_channels[0],output_channels[0],2,2)
        blocks=[
            nn.Sequential(nn.Conv2d(inc,outc,3,1,1),nn.ReLU()) for inc,outc in zip(input_channels[1:],output_channels[1:])
        ]
        self.net=nn.Sequential(*blocks)
    def forward(self,input):
        up=self.upsample(input)
        return self.net(up)
class Decoder(nn.Module):
    def __init__(self):
        super(Decoder,self).__init__()
        block_channels_in=[[1024,1024],[512,512],[256,256]]
        block_channels_out=[[1024,512],[512,256],[256,128]]
        blocks=[
            DecoderBlock(incs,outs) for incs,outcs in zip(block_channels_in,block_channels_out)
        ]
        self.net=nn.Sequential(*blocks)
    def forward(self,input):
        return self.net(input)
def create_and_convert_onnx():
    input=Variable(torch.randn(1,1024,7,7)).cuda()
    model=Decoder().cuda()
    torch.onnx.export(model,input,"decoder.onnx",input_names=["input"],output_names=["output"],verbose=True,opset_version=11)

BenjaminCoquelle commented 2 years ago

given your error it shows you need 16GB of memory to run the model while your GPU has only 4GB. Hence the error you get

baicaiPCX commented 2 years ago

But Running the model is ok with tensorrt on 4GB NVIDIA device, and the size of model is only 43.5MB, its structure and operations are very simple. This model as follows:

GPUOpen-LibrariesAndSDKs / RadeonML

BiasAdd:FusedConv:Failed to allocate memory #17