facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.29k stars 2.5k forks source link

Evaluation on coco2017 (5000 images) is extremely slow #522

Open Jacobew opened 5 years ago

Jacobew commented 5 years ago

❓ Questions and Help

Hi, I found that evaluation on coco2017 with 5000 images is extremely slow.

image

I haven't finished the evaluation process yet, but it seems that this would take about 3 hours to complete.

This is the command I used on 1 gpu,

python tools/test_net.py --config-file 'configs/e2e_mask_rcnn_R_50_FPN_1x.yaml'  TEST.IMS_PER_BATCH 4

And I found that the gpu usage is zero, this is quite weird.

I did not change the parameter MODEL.ROI_HEADS.DETECTIONS_PER_IMG. Could you help me figure it out?

Jacobew commented 5 years ago

In fact, the whole evaluation takes about 4 hours to finish. Any ideas? @fmassa

fmassa commented 5 years ago

Hi @Jacobew

You are using a config file which hasn't been trained for detection (and this config in particular doesn't exist).

Can you try running the same thing using a model already trained for the detection task? For example, the one in configs/caffe2 folder, and report back?

Jacobew commented 5 years ago

Thanks for the reply! @fmassa

But sorry I don't get what you mean. The config file I used is right here in this project. And the yaml in configs/caffe2 folder seems to have no difference between the one I used except for MODEL.WEIGHT.

chengyangfu commented 5 years ago

If you are using 'configs/e2e_mask_rcnn_R_50_FPN_1x.yaml' for testing without specifying new MODEL.WEIGHT, it means you use an untrained model for inference. So, you will get very bad detection results. It also will be very slow because there is some post-processing after CNN inference. For example, using some threshold to filter out low confidence predictions. In your case, because you didn't train the model and so the thresholding does not work properly.

If you just want to test the detection, you can use the script in config/caffe2, in this case, the program will download the trained detection model from facebook server automatically and run the test.
Otherwise, you need to train the model first. Then use your trained model for inference by during adding MODEL.WEIGHT YOURMODEL in your testing command.

Jacobew commented 5 years ago

Hi, @chengyangfu , thanks for the comment.

In fact, I've finetuned the model from MODEL.ZOO before I used this command, and the pretrained model will be loaded from last checkpoint when testing.

chengyangfu commented 5 years ago

I see. Can you run the inference with the model from MODEL.ZOO or use the configs/caffe2 first? Just to make sure the slowness is not caused by the fine-tuning.

Jacobew commented 5 years ago

I agree with you. I'll try it and report back here later.

madurner commented 5 years ago

@Jacobew

And I found that the gpu usage is zero, this is quite weird.

Referring to this: Check if MODEL.DEVICE is set to "cuda" and not "cpu"

Jacobew commented 5 years ago

@maedmaex Hi, thanks for the comment.

I've checked it and MODEL.DEVICE is "cuda".

image
madurner commented 5 years ago

@Jacobew any updates?

Jacobew commented 5 years ago

@maedmaex Sorry for the late reply. I think it's because I added another branch that slows down the evaluation and when testing with 8 gpus, evaluation time reduces to within 10 minutes.

goodmellow commented 5 years ago

hi ,please tell me that how to slove the problem of evaluation is extremely slow? ? thanks @Jacobew and i also used same command .

Jacobew commented 5 years ago

@goodmellow hi, try testing with multiple gpus.

goodmellow commented 5 years ago

yeah, but Is there a way to increase speed on a GPU?? @Jacobew

Jacobew commented 5 years ago

@goodmellow I haven't find such a way yet. In fact the model in master branch tests good in my experiments if I add no more branches to it.

qianyizhang commented 5 years ago

from what i understand, the bottleneck is actually the mask encoding in cocoApi, where it formats pixel-wise image to a running length encoded string, which is a representation that saves a lot space . if you do not intend to save your prediction result, you could dig into its API and work around encoding -> compute IOU matrix by your own direct implementation, which should be alot faster.

The better way to do this is to nag someone to make a cuda implementation of mask encoder for cocoAPI, which is what I am doing here ;p

Jacobew commented 5 years ago

@qianyizhang Thanks for the idea, would you please share your implementation?

qianyizhang commented 5 years ago

i simply skip the mask evaluation completely, and assumes it would give ~2point drop.

sarahmass commented 4 years ago

I have been using WEIGHT: "https://download.pytorch.org/models/maskrcnn/e2e_mask_rcnn_R_50_FPN_1x.pth" as my weights and it takes less than 20 minutes to run inference on all 5000 images. I don't have the exact time because I am running it on AzureML and I have to build the libraries each time I do my tests. I am also running on a single gpu with TEST.IMS_PER_BATCH: 10. I hope this helps.