eshibusawa / JBF-Stereo

GPU implementation of disparity refinement filter
BSD 2-Clause "Simplified" License
14 stars 0 forks source link

About speed #1

Closed ajunyo closed 1 year ago

ajunyo commented 1 year ago

The project is really outstanding, but when I used this matching method, I found that the speed was really slow. Did it call cuda correctly

eshibusawa commented 1 year ago

Hi, Did you evaluate stereo matching computational time correctly? Because this project uses NVRTC the class compiles the raw kernel before the matching process. So the kernel compile time should be excluded at evaluation of computational time. This overhead occurs at the first time only.

ajunyo commented 1 year ago

I added it to my program to replace the BM matching algorithm included in the original opencv contrib, but I feel that the frame rate is very low

ajunyo commented 1 year ago

Hi, Did you evaluate stereo matching computational time correctly? Because this project uses NVRTC the class compiles the raw kernel before the matching process. So the kernel compile time should be excepted at evaluation of computational time. This overhead occurs at the first time only.

I would also like to ask which example is better for video streams with a scale of 1200 * 800

eshibusawa commented 1 year ago

What is your problem? I do not implement opencv BM. If you have any speed problem at opencv BM you should raise a report to the development team.

I would also like to ask which example is better for video streams with a scale of 1200 * 800 Reuse the instance for your video stream processing for avoiding the kernel compilation overhead.

ajunyo commented 1 year ago

What is your problem? I do not implement opencv BM. If you have any speed problem at opencv BM you should raise a report to the development team.

I wanted to replace the BM algorithm with your project, and I successfully used the method of this project to replace the BM algorithm. but after replacing it, I found that the speed was particularly slow. Should the Cuda initialization kernel be slower when running for the first time? But I've run it many times and it seems like it's very slow. For high-resolution images, JBF ELAS PatchMatch which is suitable for them? thanks for replying

eshibusawa commented 1 year ago

Reuse the instance for your video stream processing for avoiding the kernel compilation overhead.

ajunyo commented 1 year ago

Reuse the instance for your video stream processing for avoiding the kernel compilation overhead.

I'm very sorry, I'm not very familiar with cuda. Do you mean to run the program repeatedly, or do you want to run demo and main before I run the program to reduce this overhead.

eshibusawa commented 1 year ago

This software IS NOT developed to be used by a people who are not familiar with stereo vision algorithm, cuda and cupy. Your raised issue is NOT about my project and not related to bug, the implementation and the algorithmic improvements. Please refer the CUDA [1] and CuPy document [2] for implementation detail or the original paper [3, 4] for algorithm detail.

  1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
  2. https://cupy.dev/
  3. Bleyer, M., Rhemann, C., & Rother, C. (2011, August). Patchmatch stereo-stereo matching with slanted support windows. In Bmvc (Vol. 11, pp. 1-11).
  4. Geiger, A., Roser, M., & Urtasun, R. (2010, November). Efficient large-scale stereo matching. In Asian conference on computer vision (pp. 25-38). Berlin, Heidelberg: Springer Berlin Heidelberg.