emiliofidalgo / ibow-lcd

Appearance-based Loop Closure Detection using Incremental Bags of Binary Words
GNU General Public License v3.0
163 stars 31 forks source link

Code is too slow? #6

Closed mpkuse closed 5 years ago

mpkuse commented 5 years ago

Hi, I came across your IROS paper and found it interesting. I could compile your code and run it on my images.

I modified your main.cc file a bit: I am extracting 500 ORB features instead of your 1500. My image size : 640x512.

I am timing the code and I find that this part, takes up more than 1.5 sec.

lcdet.process(i, kps, dscs, &result);

I am wondering is this normal or I am making a mistake running your code? I did a catkin_make with Release flag as you have mentioned in this readme on this repo.

emiliofidalgo commented 5 years ago

Hi Manohar,

Perfomance issues can come from different sources, and they also depend on several factors. The algorithm makes use, for paralellization, of all the CPUs that the hosting machine has, so the use of CPU can be high in some occasions.

Perhaps this thread can be useful:

https://github.com/emiliofidalgo/ibow-lcd/issues/3

If these advices don't work, we will have to go deeper inside the process function to find the root of your problem.

Please, let me know about your progress!

Thank you!

mpkuse commented 5 years ago

Hi Emilio, I made the following changes : a) cv::Ptr detector = cv::ORB::create(500); // from originally 1500 to 500. b) params.purge_descriptors = false; c) params.min_feat_apps = 5;

Still the perframe processing time is quite high. Cannot seem to be getting the 500ms per frame as mentioned in your ibow-lcd paper. I believe your suggestion viz. to disable purging descritpors, and so forth was not the bottleneck. Please look below for my profiling of your code.

Also my CPU usage is 800% indicating it is using all my 8 cores. Could this slow processing be infact be a threading issue? Is it possible to disable threading ?

with changes

Additionally, I profiled your code specifically the process() function:

--- Processing image 252 of 1225: /Bulk_Data/_tmp_cerebro/bb4_multiple_loops_in_lab//827.jpg
cv::imread in (ms) 3
detector->detectAndCompute in (ms) 11
    0 Storing the keypoints and descriptors (ms):
    0 Adding the current image to the queue to be added in the future (ms)
    3501 Adding new hypothesis(ms)
    1757 Searching the query descriptors against the features (ms):
    0 Filtering matches according to the ratio test (ms)
    0 We look for similar images according to the filtered matches found
    0 Filtering the resulting image matchings
    0 buildIslands()
lcdet.process (ms) 5267

Looks like almost all the processing is taken by adding new hypothesis and searching query.

in function, [LCDetector::addImage]

        1766 Searching the query descriptors against the features (ms)
        0 Filtering matches according to the ratio test (ms)
        1702 Finally, we add the image taking into account the correct matchings (ms). 

What do you think is the issue?

emiliofidalgo commented 5 years ago

Hi Manohar,

It is weird, I never obtained these high times. We are now even working with iBow-LCD in an old machine and the response times are not so high. As commented in the paper, using an incremental approach presents some drawbacks, since a set of descriptors should be managed, but they should't be so exaggerated.

As you comment, perhaps is a multicore processing problem. If you want to disable it, you just need to search in ibow-lcd and obindex2 and comment the following line:

#pragma omp parallel for

Are you using a virtualization system or do you execute the application natively? Maybe this can be another point to take into account.

Best,

mpkuse commented 5 years ago

I am using docker. I think it has to do with how threads access, it might be slowing due to threading issues is my first guess.

Let me try disabling OpenMP and get back to you.

mpkuse commented 5 years ago

Just to inform, I commented out 3 instances of pragma omp parallel for in obindex2/src/binary_index.cc.

This has resulted in speed-up (details below). The processing time is now near to 400-500ms per frame as claimed in your paper. Also cpu usage is 95-105% now impling only 1 core is in use.

I guess the OMP parallel search needs fixing.

--- Processing image 1224 of 1225: /Bulk_Data/_tmp_cerebro/bb4_multiple_loops_in_lab//3743.jpg
cv::imread in (ms) 6
detector->detectAndCompute in (ms) 22
    0 Storing the keypoints and descriptors (ms):
    0 Adding the current image to the queue to be added in the future (ms)
        106 [LCDetector::addImage]Searching the query descriptors against the features
        0 [LCDetector::addImage]Filtering matches according to the ratio test
        13 [LCDetector::addImage]Finally, we add the image taking into account the correct matchings
    120 Adding new hypothesis(ms)
    78 Searching the query descriptors against the features (ms):
    0 Filtering matches according to the ratio test (ms)
    0 We look for similar images according to the filtered matches found
    0 Filtering the resulting image matchings
    0 buildIslands()
lcdet.process (ms) 201
emiliofidalgo commented 5 years ago

Perfect! It seems a problem in this regard. Anyway, we used the OMP paralellization to obtain our results and I remember to make some tests with and without this feature, observing better results with it. I guess you have more cores in your machine (4 vs 8) and the required management time is worst than in my case. It is just a theory.

Thank you for checking the code, I hope you find it useful.

Best regards,

ferreram commented 4 years ago

Just for your information, commenting out the three calls to OMP just led to a 5x speed-up on my machine. Perhaps it should be worth for only providing OMP as an option?

emiliofidalgo commented 4 years ago

Thank you Maxime for your information. Certainly, providing OMP as an option could be a good solution. Anyway, given that it seems that for most people works better without OMP, I will comment out the corresponding lines as a temporary solution.