VLOGroup / dvs-reconstruction

This repository provides software to our publication "Real-Time Intensity-Image Reconstruction for Event Cameras Using Manifold Regularisation", BMVC 2016
GNU Lesser General Public License v3.0
54 stars 18 forks source link

A minimum of compute capability from 3.0 to 2.1 make sense to try? #1

Closed kpykc closed 7 years ago

kpykc commented 7 years ago

Hi, I would like to test your algorithm. So, I've succeed with building, with some minor problems. But after I have tried to run it on some sample camera log, I've found that it requires card compute capability 3.0.

I have a pretty old laptop (check below).

As you for sure know your codebase better than me (I found your article/repo today), e.g. what exact 3.0 features it uses, etc.

So, I would like to ask, if it makes sense to even try to patch/replace code depending on -arch=sm_30?

Or it would be too much work, and I should go and find some appropriate hardware for tests?

Thank you.

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Quadro 1000M"
  CUDA Driver Version / Runtime Version          7.5 / 7.5
  CUDA Capability Major/Minor version number:    2.1
  Total amount of global memory:                 2047 MBytes (2146631680 bytes)
  ( 2) Multiprocessors, ( 48) CUDA Cores/MP:     96 CUDA Cores
  GPU Max Clock rate:                            1400 MHz (1.40 GHz)
  Memory Clock rate:                             900 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 131072 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = Quadro 1000M
Result = PASS
reini1305 commented 7 years ago

Hi! I'm using CudaTextureObjects all through the code. These have been added only with compute 3.0. Patching the code here might be possible, but the imageutilities make heavy use of newer features, so this might be tricky to compile. I think patching will be doable in a reasonable amount of time, but for convenience, I will stick to the 3.0 features.

kpykc commented 7 years ago

Thanks, for fast reply. ( Actually, I was asking if it makes sense for me to try to do that :) ). From your answer I understood, that probably I shouldn't try, as for me it will be more complex task. I will stick with "searching the hardware" option.

Thank you again!