NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.86k stars 2.14k forks source link

[onnx-graphsurgeon] Pad Operator folding pads #3353

Closed inisis closed 1 year ago

inisis commented 1 year ago

Hi, when doing constant folding, only nodes with constant will be folded, but for graph like below, it won't be folded. image

Should we add logic here to deal such case.

I have achieved such by adding such logic

            def is_constant_pad(node):
                if node.op == 'Pad':
                    return True
                else:
                    return False

            # Walks along the outputs of graph_constants to see if they can also be computed statically.
            # Since the graph is topologically sorted, this should find all constant nodes in the graph.
            for node in graph_clone.nodes:
                if is_foldable(node):
                    graph_constants.update({out.name: out for out in node.outputs})
                if is_constant_pad(node):
                    graph_constants.update({node.inputs[1].name: node.inputs[1]})
            return graph_constants

@pranavm-nvidia Can you take a look. Thanks!

pranavm-nvidia commented 1 year ago

But isn't this operating on a non-constant input? How would we fold it?

inisis commented 1 year ago

Since the input shape of the model is fixed, then the shape of the add layer is fixed, so the pad can be calculated and fixed.

pranavm-nvidia commented 1 year ago

Only the shape of the output could be calculated though, not the contents.

inisis commented 1 year ago

Not really, you can see my affliated picture, the pad input of the Pad node can be pre-calculated, since the subgraph inputs are fixed(it only takes the shape of Add layer as inputs, so it will not change when model input shape is fixed), the raw model is provided here tf_efficientnetv2_s.onnx

pranavm-nvidia commented 1 year ago

Sure, the pad input can be calculated ahead of time in this case (and it already is when I run it locally - see attached screenshot), but the logic you posted above is not generally applicable since you could have a Pad node with truly dynamic padding.

folded_pad

inisis commented 1 year ago

Wow, can you paste your code here, since I cannot reproduce it in my code base, by the way, the pad node can be merged into next conv.

pranavm-nvidia commented 1 year ago

I used Polygraphy:

polygraphy surgeon sanitize --fold-constants tf_efficientnetv2_s.onnx -o folded.onnx
inisis commented 1 year ago

Polygraphy depends on onnxruntime, but should we reimplement that in graphsurgeon.

pranavm-nvidia commented 1 year ago

We could, but I think it's a bit cleaner to do this in Polygraphy. ONNX-GraphSurgeon was intended to be for lower-level APIs that allow for manipulating the graph directly. Perhaps the entirety of the fold_constants routine should have been in Polygraphy in the first place since that provides cleaner mechanisms to work with optional dependencies.

inisis commented 1 year ago

That seems resonable, thanks for your elabration.