Code fails to run in both jupyter and console

marcjasner commented 1 year ago

Sorry for the less than descriptive title, but I wasn't sure how else to title it.

I've got a 4gb Jetson Nano (SeeedStudio Jetson Recomputer J2010 carrier board) with a 128gb SSD as the root storage device. It's running JetPack 4.6 (output of 'apt-cache show nvidia-jetpack' below)

`$ sudo apt-cache show nvidia-jetpack Package: nvidia-jetpack Version: 4.6-b199 Architecture: arm64 Maintainer: NVIDIA Corporation Installed-Size: 194 Depends: nvidia-cuda (= 4.6-b199), nvidia-opencv (= 4.6-b199), nvidia-cudnn8 (= 4.6-b199), nvidia-tensorrt (= 4.6-b199), nvidia-visionworks (= 4.6-b199), nvidia-container (= 4.6-b199), nvidia-vpi (= 4.6-b199), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<< 32.7-0) Homepage: http://developer.nvidia.com/jetson Priority: standard Section: metapackages Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_4.6-b199_arm64.deb Size: 29368 SHA256: 69df11e22e2c8406fe281fe6fc27c7d40a13ed668e508a592a6785d40ea71669 SHA1: 5c678b8762acc54f85b4334f92d9bb084858907a MD5sum: 1b96cd72f2a434e887f98912061d8cfb Description: NVIDIA Jetpack Meta Package Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Package: nvidia-jetpack Version: 4.6-b197 Architecture: arm64 Maintainer: NVIDIA Corporation Installed-Size: 194 Depends: nvidia-cuda (= 4.6-b197), nvidia-opencv (= 4.6-b197), nvidia-cudnn8 (= 4.6-b197), nvidia-tensorrt (= 4.6-b197), nvidia-visionworks (= 4.6-b197), nvidia-container (= 4.6-b197), nvidia-vpi (= 4.6-b197), nvidia-l4t-jetson-multimedia-api (>> 32.6-0), nvidia-l4t-jetson-multimedia-api (<< 32.7-0) Homepage: http://developer.nvidia.com/jetson Priority: standard Section: metapackages Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_4.6-b197_arm64.deb Size: 29356 SHA256: 104cd0c1efefe5865753ec9b0b148a534ffdcc9bae525637c7532b309ed44aa0 SHA1: 8cca8b9ebb21feafbbd20c2984bd9b329a202624 MD5sum: 463d4303429f163b97207827965e8fe0 Description: NVIDIA Jetpack Meta Package Description-md5: ad1462289bdbc54909ae109d1d32c0a8 ` I've set up a python 3.6 virtualenv and followed the installation instructions for all required packages. There were no errors during any of the installations and I've verified all of the packages import properly from the python command line. When I run the jupyter notebook 'live_demo.ipynb' I am able to run all of the steps up until the following step:

`import torch2trt

model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)`

When I attempt to run that step the systems thinks about it for a bit and then a dialog pops up that says the python kernel has crashed and will be automatically restarted. I cannot get past this step.

To help debug/diagnose I took all of the code from the notebook and incrementally added it to a python file to see if I could reproduce the issue. The code I have so far is:

`import cv2 import json import trt_pose.coco import trt_pose.models import torch import torch2trt from torch2trt import TRTModule import time import torchvision.transforms as transforms import PIL.Image from trt_pose.draw_objects import DrawObjects from trt_pose.parse_objects import ParseObjects from jetcam.usb_camera import USBCamera from jetcam.csi_camera import CSICamera from jetcam.utils import bgr8_to_jpeg import ipywidgets from IPython.display import display

with open('human_pose.json', 'r') as f: human_pose = json.load(f)

topology = trt_pose.coco.coco_category_to_topology(human_pose)

num_parts = len(human_pose['keypoints']) num_links = len(human_pose['skeleton'])

model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval() MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth' model.load_state_dict(torch.load(MODEL_WEIGHTS))

WIDTH = 224 HEIGHT = 224

data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda() print("Calling torch2trt.torch2trt\n") model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25) print("Done\n") OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth' print("Calling torch.save\n") torch.save(model_trt.state_dict(), OPTIMIZED_MODEL) print("Done\n")`

When I run this code I see "Calling torch2trt.torch2trt" in the console and then, after a pause, I see the following error repeated many times on the console:

[TensorRT] ERROR: 3: [builderConfig.cpp::canRunOnDLA::341] Error Code 3: Internal Error (Parameter check failed at: optimizer/api/builderConfig.cpp::canRunOnDLA::341, condition: dlaEngineCount > 0 )

After that the system will seem to hang. The desktop will display a low memory warning (If I run 'watch -n1 free -h' in another console window I can see free memory drop from 3+GB to as little as 96MB). After some time the process just reports "Killed" and exits back to the command line.

I am at a loss. Can you please provide any helpful information you have that might help me correct this issue and continue?

Thanks Marc

ArcadeHustle commented 1 year ago

bump

janezlapajne commented 1 year ago

@marcjasner have you figured out what the problem was? However, in my case I am able to pass this point, but I also get the same message written in the console. Also, the FPS is lower (8) than claimed (22)? My program then crashes at line: torch.save(model_trt.state_dict(), OPTIMIZED_MODEL), where the optimized model weights should be saved..

marcjasner commented 1 year ago

@janezlapajne, no I never resolved that issue. I ended up investigating other pose detection methods and found the jetson_inference posenet code worked reliably and gave pretty good performance (about 17FPS)

janezlapajne commented 1 year ago

Thanks for telling me, I wish I would find out sooner - wouldn't spend that much time on it.. can you redirect me to the repo/model? I hope it can be quickly set and tested? Probably is this one (?): https://github.com/dusty-nv/jetson-inference/blob/master/docs/posenet.md

If you have any other suggestions etc. please let me know. I would just like to make it work ASAP.

dusty-nv commented 1 year ago

@janezlapajne yes, that is the one

marcjasner commented 1 year ago

Yep, as @dusty-nv pointed out, that is the correct one. Also, Dusty is super helpful on the Nvidia forums. He's helped me a number of times!

I think you'll find getting the code compiled and getting the posenet sample running is pretty straightforward and easy, and adapting it to any projects you're working on should be similarly easy.

janezlapajne commented 1 year ago

Hello, thank you both! Yes, I agree, @dusty-nv dusty never disappoints💪 Also, this jetson-inference package is amazing - made the pose model work in literally a couple of minutes, inside a docker container. Sincerely, dusty, great work! I really appreciate it! A few months ago, I also used the SSD detector from jetson-inference and it worked like a charm. The retraining process has worked out of the box with an additional script for the automatic download procedure of the open-images dataset. Really helpful if you want to quickly test and prototype something.

Anyways, to conclude I will ask something else (correct me if this is not the right place to ask). We plan to use two models concurrently on a Jetson nano - a pose model and detection model i.e. preferably yolov7 with via Deepstream package. Can Jetson Nano handle resources appropriately in such cases? Thank you!

MAVProxyUser commented 1 year ago

For anyone that may care, I got tired of messing with all the various pose implementations that did NOT work in Jetson environment out of the box due to compile issues or poor documentation.

I moved over to edge based processing ON the camera instead via Luxonis. Here is the Luxonis Pose example. All resources run on a Movidius VPN inside the camera. https://github.com/luxonis/depthai-experiments/tree/master/gen2-human-pose

dusty-nv commented 1 year ago

a pose model and detection model i.e. preferably yolov7 with via Deepstream package. Can Jetson Nano handle resources appropriately in such cases?

Due to it's limited compute resources, it's hard to say with Nano, so you may need to play around with it - also, deepstream supports pose estimation (https://docs.nvidia.com/tao/tao-toolkit/text/bodypose_estimation/bodyposenet.html), so ultimately you may find that doing both detection+pose in deepstream gives you better performance.

janezlapajne commented 1 year ago

@dusty-nv ok, will see how it goes. Will report to the forum if I have any further questions.

patriciamold33 commented 1 year ago

@marcjasner When I run the jupyter notebook 'live_demo.ipynb' I am able to run all of the steps up until the following step:

`import torch2trt

model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)`

I have a similar issue, just that it doesn't crash , the code just stays stuck... Does anyone have some advice for me?

NVIDIA-AI-IOT / trt_pose

Code fails to run in both jupyter and console #166