Open benguo2 opened 4 years ago
Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.
cc @TristonC whose team can best help.
Hi all! Currently I’m developing with the Jetson Nano, and I’m looking for advice in regards to improving inference performance. I’m using Sagemaker as my training environment for SSD object detection with Resnet-50 as my base network, which exports .params and .json files for mxnet. I’ve built mxnet on the nano using the autoinstaller from the nvidia forum (https://forums.developer.nvidia.com/t/i-was-unable-to-compile-and-install-mxnet1-5-with-tensorrt-on-the-jetson-nano-is-there-someone-have-compile-it-please-help-me-thank-you/111303/25), and I’ve been able to infer via usb webcam by more or less following this guide: https://aws.amazon.com/blogs/machine-learning/build-a-real-time-object-classification-system-with-apache-mxnet-on-raspberry-pi/ That said, my inference speed is really slow, i.e. it takes around 4-5 seconds per frame with an input size of 512x512. I’ve already tried converting my weights to a different architecture via onnx and mmdnn, but my custom model had operators that were not supported by either format so it looks like I’m stuck with mxnet. The mxnet website says that it has tensorrt integration with mxnet but I can’t find any good examples of that anywhere online. The one on the mxnet website is at best confusing and doesn’t help me in my particular use case
One thing that seems to be holding me back is that I’m only able to infer using the cpu. According to the mxnet website, to use the gpu all you have to do is change ctx=cpu() to ctx=gpu(), and make sure that your data is converted to float32 before inputting it (https://github.com/apache/incubator-mxnet/issues/13332). However, when I do that, it still crashes my Jetson because it seems to run out of memory. Does this have anything to do with the custom build of mxnet for the Nano? Otherwise why would it do that?
Any suggestions are welcome and appreciated!
Here’s my code: