dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5k stars 1.33k forks source link

error in yolact++ (too many resources requested for launch) #251

Open sdimantsd opened 4 years ago

sdimantsd commented 4 years ago

the firts yolact works fine. In yolact++ i get this error: python3 eval.py --trained_model=weights/yolact_plus_resnet50_54_800000.pth --score_threshold=0.3 --top_k=25 --images=/home/ws/imgs/300/:/home/ws/imgs_out

error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch /home/ws/imgs/300/6.jpg -> /home/ws/imgs_out/6.png error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch error in modulated_deformable_im2col_cuda: too many resources requested for launch ... ... and so on

Do you know what the problem?

dbolya commented 4 years ago

Hmm never seen that before. You've compiled and set up DCN right?

@chongzhou96 have you seen this before?

sdimantsd commented 4 years ago

Yes, but when i compiling using python3 setup.py build develop

I get premission error, so i compile using sudo python setup.py build develop

But i don't think it's a problem, no?

dbolya commented 4 years ago

Hmm what happens if you run python then in the shell try to import dcn_v2?

sdimantsd commented 4 years ago

The import works

dbolya commented 4 years ago

Hmm what's your pytorch version?

sdimantsd commented 4 years ago

1.3.0 Running on Jetson nano

sdimantsd commented 4 years ago

cuda 10.0.326

dbolya commented 4 years ago

Ok it's not version specific then. I'll get back to you on this.

sdimantsd commented 4 years ago

OK, Thank you very much!

sdimantsd commented 4 years ago

@dbolya Do you have any news?

pythops commented 4 years ago

+1

Shame-fight commented 4 years ago

1.3.0 Running on Jetson nano

I want to know how many fps yolact can achieve on jetson nano and how well it recognizes small targets. In the end, I can get the coordinates of the target rectangle or mask.Thanks in advance

sdimantsd commented 4 years ago

1.3.0 Running on Jetson nano

I want to know how many fps yolact can achieve on jetson nano and how well it recognizes small targets. In the end, I can get the coordinates of the target rectangle or mask.Thanks in advance

Both question depends on the input size. What input size do you use?

sdimantsd commented 4 years ago

With resnet 101 backbone: 700x700 take about 1.6 sec for one frame (~0.625 fps)

Shame-fight commented 4 years ago

1.3.0 Running on Jetson nano

I want to know how many fps yolact can achieve on jetson nano and how well it recognizes small targets. In the end, I can get the coordinates of the target rectangle or mask.Thanks in advance

Both question depends on the input size. What input size do you use?

I mistakenly think that yolact cannot be used on jetson nano or TX2, because of computing power, I haven't tried it yet. What image size do you recommend and what speed and effect can you achieve?

sdimantsd commented 4 years ago

The image size depends on you'r needed. if you'r objects are small you will need bigger input size. if not, you can use smaller input size. By YOLACT article the mAP on 550 and 700 it only 1.4% (29.8% vs 31.2%), but the diffrent in the FPS is bigger (33.5 vs 23.6)

Shame-fight commented 4 years ago

With resnet 101 backbone: 700x700 take about 1.6 sec for one frame (~0.625 fps)

The image size depends on you'r needed. if you'r objects are small you will need bigger input size. if not, you can use smaller input size. By YOLACT article the mAP on 550 and 700 it only 1.4% (29.8% vs 31.2%), but the diffrent in the FPS is bigger (33.5 vs 23.6)

I want to get a background mask, such as a highway or lawn, to determine the coordinates of the boundary in autonomous driving. How many coordinates will the target mask output? Because I can't determine the boundary from four coordinates similar to a rectangular frame, because the background's boundary is irregular. I also have a question: how well does yolact recognize small targets (about tens to 200 pixels), thanks your response.

sdimantsd commented 4 years ago

@chongzhou96 @dbolya Hi, anything new?

dbolya commented 4 years ago

@sdimantsd Oops I asked @chongzhou96 to check in on this but I don't think he got anywhere. I'm going to tentatively say that the jetson you have there doesn't have enough CUDA kernels to run the custom deformable conv kernel code. Fixing this might require changing the deformable conv code to batch its calls better, which is not something I can really do easily.

Perhaps it would be useful to train a version of YOLACT++ without deformable convs? The other improvements would still give a benefit over the base version, and that should work on your GPU.

abhigoku10 commented 4 years ago

@dbolya is there an option to train the Yolact++ without deformable convs?

dbolya commented 4 years ago

@abhigoku10 Simply replace the backbone (resnet101_dcn_inter3_backbone) here: https://github.com/dbolya/yolact/blob/13bb0c6322aa35777b73d3ca6522a080588fef03/data/config.py#L775 with yolact_base_config.backbone

and resnet50_dcnv2_backbone here: https://github.com/dbolya/yolact/blob/13bb0c6322aa35777b73d3ca6522a080588fef03/data/config.py#L797 with yolact_resnet50_config.backbone

breznak commented 4 years ago

is there an option to train the Yolact++ without deformable convs?

yes: https://github.com/dbolya/yolact/issues/251#issuecomment-577988085

interesting. What are the advantages for Yolact++ (without the DCNv2) comapred to Yolact? Maybe the default code should switch to running yolact++ without the DCN and only make use of it if compiled in?

dbolya commented 4 years ago

@breznak The problem is it would require retraining, but the fact that the recent versions of pytorch have deformable convs is very good news. We could potentially avoid this issue entirely. I guess we just need to wait until they add DCNv2 support.

Also, the model performs 1.6-2.8 mAP worth without the deformable convs so it's significant enough that I'd say we want to keep them in.

breznak commented 4 years ago

Perhaps it would be useful to train a version of YOLACT++ without deformable convs? The other improvements would still give a benefit over the base version, and that should work on your GPU.

I was curious about the "other improvements over the previous version [of YOLACT]" ?

The problem is it would require retraining, [...] it's significant enough that I'd say we want to keep them in

yes, but there are already duplicate weights/configs for yolact/yolact++. My idea was if there should be only YOLACT++ (and until generaly available) versions with DCNv1/DCNv2

dbolya commented 4 years ago

I was curious about the "other improvements over the previous version [of YOLACT]" ?

Other improvements and impacts on performance / speed are in the YOLACT++ paper: image

yes, but there are already duplicate weights/configs for yolact/yolact++. My idea was if there should be only YOLACT++ (and until generaly available) versions with DCNv1/DCNv2

The original YOLACT models are important to verify the claims made in the original paper and to compare the models against future papers.

Honestly, it's probably not that important to have YOLACT++ models without DCN, since the performance is close enough to the original YOLACT models anyway in that case (also, most people here are retraining from scratch anyway instead of just using the COCO trained model). I'd also rather than fix errors in the current DCNv2 compilation pipeline than to force anyone who has errors to use a worse version of the model.

I think we can just wait until Pytorch finally implements DCNv2 and then go from there.

abhigoku10 commented 4 years ago

@breznak @dbolya from the accuracy point of view for person detection yolact ++ has good upper hand compared to yolact , i had a hard time to increase the person detection in yolact on the improvements end i had few querstion @dbolya

  1. the bounding boxes for objects like lawn , roads , lanes which are not exactly rectangular or square is there any way to make the detected boxes more fit to the object size
  2. is it possible to train yolact/++ without any mask for example if i have any 4 classes with mask and 1 class without mask (the annotated object can represent just a line)
  3. can i obtain segment of objects which are not trained for examples if i trained my network for car , person in an image and its detecting what if i want objects in the background THanks in advance
dbolya commented 4 years ago

@abhigoku10

  1. You mean like some kind of rotated bounding box? As long as we use boxes, they're going to be boxes so not much I can do in that department.
  2. Yeah that's possible, just give the annotation an empty mask. Learning an empty mask for that should affect the rest of the model too much.
  3. If I understand this correctly, you want to detect new objects you haven't trained on? Well, assuming those objects are already in the images you're using for training and you just don't have ground truth for them, I'd say not likely. Since in that case, you're asking the network to treat them as background.
abhigoku10 commented 4 years ago

@dbolya thanks for the response Q1 yup something like rotated bounding box Q2 wokay shall try this Q3 yes i want to detect objects which i have not trained and yet in the background , since the network is treating it as background can it give an outline of the background structure

sdimantsd commented 4 years ago

@abhigoku10 Q3: Anything not labeled as an object - is a background. I don't think that help you much, because everything that is not labeled is a backgound, include the sky/tree/house etc...

abhigoku10 commented 4 years ago

@sdimantsd yup i need to see if i get atleast the contours of the trees building lamppost structures

dbolya commented 4 years ago

@abhigoku10 For Q3, I recommend you use some other method for the contours and such, maybe just mask the contours with the actually detected masks to get some sort of "background contour". Idk if that will be useful for you, but like @sdimantsd said, the network detects all background equally, so you can't really mine objects from that. You could see if there are any detections in the background that have slightly higher (but still low) scores for the rest of the classes, but I don't think that would be worth the effort.

For Q1 yeah that's a whole different research project in its own right. I believe there was an issue a while ago referencing some rotated bbox paper, but it would take a non insignificant amount of research to merge the two models.

abhigoku10 commented 4 years ago

@dbolya oh wokay , for mask rcnn there is rotated mask rcnn paper https://github.com/mrlooi/rotated_maskrcnn

VictimCrasher commented 4 years ago

Taken from https://github.com/xingyizhou/CenterNet/issues/461 that also uses DCNv2

If this is still open: You need to modify code in the cuda file (Should be src/lib/models/networks/DCNv2/src/cuda/dcn_v2_im2col_cuda.cu), correct:

const int CUDA_NUM_THREADS = 512;

Then compile again. For some reason Aarch64 cannot handle 1024.

In yolact, it's on src/cuda/dcn_v2_im2col_cuda.cu I've tried this on jetson nano and it worked

sdimantsd commented 4 years ago

@VictimCrasher Thanks! works good :)

sdimantsd commented 4 years ago

If anyone is still having this problem, there is a solution that NVIDIA has posted here: https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-5-0-now-available/72048 I think version 1.5 of this pytorch is fixed (cuda10.2 required, has it in JetPack 4.4). Version 1.4 of pytorch should compile it manually, there are instructions at the end of the link. Note that some changes to the source code need to be made (there is a GIT cluster name that describes the required changes). I haven't checked myself yet if it works, I'm currently compiling.

sdimantsd commented 4 years ago

It turns out there is another problem, DCNv2 code needs to be changed as well. Files: src / cuda / dcn_v2_im2col_cuda.cu src / cuda / dcn_v2_psroi_pooling_cuda.cu to change const int CUDA_NUM_THREADS = 1024; To const int CUDA_NUM_THREADS = 512;

And compile again