Open pieterblok opened 4 years ago
Hi @pieterbl86 im not one of the developers (unfortunately). But look into the function prep_display in eval.py, especially on those lines
with timer.env('Copy'):
if cfg.eval_mask_branch:
# Masks are drawn on the GPU, so don't copy
masks = t[3][:args.top_k]
classes, scores, boxes = [x[:args.top_k].cpu().numpy() for x in t[:3]]
Which Gatson would you like to use?
Hi @pieterbl86 im not one of the developers (unfortunately). But look into the function prep_display in eval.py, especially on those lines
with timer.env('Copy'): if cfg.eval_mask_branch: # Masks are drawn on the GPU, so don't copy masks = t[3][:args.top_k] classes, scores, boxes = [x[:args.top_k].cpu().numpy() for x in t[:3]]
Which Gatson would you like to use?
@sdimantsd Thank you very much. I just created another function using the main code from prep_display. The code works, thanks again.
I'm using a NVIDIA Jetson Xavier development board (512 cuda cores). With YOLACT++ I obtain an image inference speed of 5 FPS.
@pieterbl86 This might be less of a drop-in replacement than @sdimantsd's suggestion, but if you want to try reconciling with it, you could also check out the video evaluation pipeline (run eval with --video=<camera_index>
, code for it is in the evalvideo
function). It tries to keep the fps stable and does some threading to squeeze as much performance out of the GPU as possible (by staggering the evaluation of data loading, data preparation, model evaluation, and drawing the frame [i.e., output lag of 4 frames, but 4 processes are always happening at once]). Not sure how much that would benefit your use case.
Out of curiosity, what FPS did you get with Mask R-CNN?
Hi @pieterbl86 im not one of the developers (unfortunately). But look into the function prep_display in eval.py, especially on those lines
with timer.env('Copy'): if cfg.eval_mask_branch: # Masks are drawn on the GPU, so don't copy masks = t[3][:args.top_k] classes, scores, boxes = [x[:args.top_k].cpu().numpy() for x in t[:3]]
Which Gatson would you like to use?
@sdimantsd Thank you very much. I just created another function using the main code from prep_display. The code works, thanks again.
I'm using a NVIDIA Jetson Xavier development board (512 cuda cores). With YOLACT++ I obtain an image inference speed of 5 FPS.
How did u install pytorch on Jetson?? When I try to install torchvision it fails.Torch works well.
@thimabru1010 Follow the introduction in this link: https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano-version-1-4-0-now-available/
@pieterbl86 I am also looking into using this on the xavier, and I wonder is any of the Nvidia optimizations like TensorRT or Jetpack utilized? Just curious if the nvidia stuff could make it faster. For reference here are some links: Pytorch with jetpack container TensorRT
Thanks!
Hi @lzyang2000 try exporting your model to ONNX and create a engine using TensorRT (in the xavier) with dla enabled.
@dbolya Daniel, thank you very much for this nice new CNN! Thanks to your and your co-developers we can deploy instance segmentation much better on small robots (with lighter hardware, like Jetsons). Really appreciate your work, because it can have a great impact on our robotics application!
I'm used to work with Matterport's version of Mask-RCNN. We use their code on a robot, but we want to move to YOLACT for the above mentioned reasons. Unfortunately I'm not that familiar with Pytorch, and I have the following struggle:
How can I best use YOLACT in combination with a streaming camera?
From the demo-code of Matterport (https://github.com/matterport/Mask_RCNN/blob/master/samples/demo.ipynb) it's a pretty easy procedure to first load the network (step 1-4 in their Jupyter Notebook) and then use the code of step 5 in a loop to acquire images and then immediately process them with this single line of code:
results = model.detect([image], verbose=1)
I understand I have to use the eval.py procedure of YOLACT, but how can I best implement YOLACT in a similar manner as we currently do (so first load the network, parameters, and classes and then use a "detect" line(s) of code to process the images from the camera in a loop?). Please excuse me for this question as it might be super simple (I don't know where to start).