Open acherunilam opened 7 years ago
This is awesome, thanks for doing this. Let me look at your work here... BTW - why do you want to squash the commits before merging?
Great work, and here are some concerns:
@bryancatanzaro I just thought these changes would be better represented in the log if they were presented as one single "Updated CUDA runtime to 8.0" rather than 16 separate "Updated \<module1>", "Updated \<module2>", etc. Most projects do squash before merging afaik, but it's up to the maintainer of the repo.
EDIT: Changed 7.5 to 8.0
@hao-lh Correction from my side - this code is compatible with runtime version 8.0, not 7.5. I shall fix the title of the pull request.
As for the scope for improvement, I thought this repo implemented everything that was discussed in "Efficient, High-Quality Image Contour Detection" by Catanzaro et al. Is there any specific optimization that you're referring to?
@adithyabenny Most of bryan's code was written more than five years ago, since parallel computation and CUDA is evolving actively these years, I was wondering if there exists methods for better performance, totally no offense for bryan's original algorithm and your work, just want this code runs faster :)
Hi @adithyabenny , I use the code you commit, and still encounter the problem that cudaErrorIllegalAddress and the error message is CUDA error at lanczos.cu:217 code=77(cudaErrorIllegalAddress) "cudaMemcpy(devVector, d_aVectorQQ, nPixels * sizeof(float), cudaMemcpyDeviceToDevice)", could you help me, thanks a lot. i'm using titanx and ubuntu 14.04 adn i download the acml5.3.0. thanks.
and here is the completely output ` ./bin/linux/release/damascene damascene/polynesia.ppm Using cuda device 2: GeForce GTX TITAN X Processing: damascene/polynesia.ppm, output in damascene/polynesiaPb.pgm and damascene/polynesia.pb
Eig 9 Tol 0.001000 Texton 1 Image found: 321 x 481 pixels Available 12672958464 bytes on GPU
+< rgbUtoGrayF | 0.244000 | ms Convolving Beginning kmeans Changes: 150860 Changes: 78580 Changes: 50898 Changes: 38726 Changes: 30185 Changes: 25232 Changes: 21250 Changes: 18425 Changes: -179543699 8 iterations until termination Kmeans completed +< texton | 237.464996 | ms +< rgbUtoLab3F | 2.259000 | ms +< normalizeLab | 0.016000 | ms +< mirrorImage | 0.858000 | ms Beginning Local cues computation +< Bgsmooth: | 7.079000 | ms +< Bg: | 35.658001 | ms +< Cgsmooth: | 18.371000 | ms +< Cga: | 44.307999 | ms +< Cgsmooth: | 18.410000 | ms +< Cgb: | 44.462002 | ms +< Tgsmooth: | 17.982000 | ms +< Tg: | 39.193001 | ms Completed Local cues localcues time: 0.178665 seconds +< localcues | 178.677994 | ms +< combine | 1.499000 | ms
Max time: 0.000406 seconds Oriented Max time: 0.000509 seconds Solve time: 0.000933 seconds
+< nonmaxsupression | 6.005000 | ms Intervening contour completed +< intervene | 7.725000 | ms Available 12572688384 bytes on GPU Can fit 18306 iterations on GPU lanczos iteration: 0 CUDA error at lanczos.cu:217 code=77(cudaErrorIllegalAddress) "cudaMemcpy(devVector, d_aVectorQQ, nPixels * sizeof(float), cudaMemcpyDeviceToDevice)" `
I've migrated all API calls to the new CUDA SDK, and fixed the illegal memory access issue (#2). The program runs successfully on Ubuntu 14.04 with a Tesla K20 GPU. The run time is about 2 seconds when given the default Polynesia image.
For it to work, I'd to create two symlinks - one file at
./lib/libblas.so
which points to/usr/lib/libblas.so.3
since BLAS wasn't being detected, and another directory at./lib/acml
which points to the location for the uncompressed ACML package. Additionally, I'd to also set the dynamic library path for it to detect ACML, by addingexport LD_LIBRARY_PATH="$HOME/acml5.3.1/ifort64/lib/:$LD_LIBRARY_PATH"
to my bashrc.Do remember to squash the commits before merging :)