Why two subgraphs instead of one whole graph

HaohaoNJU / CenterPoint

TensorRT deployment for CenterPoint Lidar Detection Model.

MIT License

270 stars 56 forks source link

Why two subgraphs instead of one whole graph #18

Closed hygxy closed 1 year ago

hygxy commented 2 years ago

In the README, it says "Here we extract two pure nn models from the whole computation graph---pfe and rpn, this is to make it easier for trt to optimize its inference engines, and we use cuda to connect these nn engines."

Is there any repo/docu/link/tutorial that supports this argument? i.e., why is it easier for trt to optimize its inference engines? (one onnx vs two onnxs)

HaohaoNJU commented 2 years ago

TensorRT is developed to optimize nn inference, the connection part between pfe & rpn (voxel assigning) involves no nn computing, I don't think TRT would optimize that part of computation.
Since TensorRT is a like black box, I'd rather believe you should take over the non-nn computation part as much as possible, because you can control something like memory allocation or threads allocation and so on.

That is why I spilled into two part of nn graphs, and connected them using cuda.

ryanyej commented 1 year ago

But 2 onnx will generate more mem cost, like pfe output ,intermediates and rpn input cost more resources.