archenroot / gentoo-overlay

Gentoo overlay with main focus on GPU, Neural Networks, Big Data and Java technologies
http://gentoo.archenroot.org
15 stars 4 forks source link

Gentoo support for new USE_EXPAND=GPU_ARCHITECTURE #24

Open archenroot opened 7 years ago

archenroot commented 7 years ago

There is common problem with hardware GPU acceleration support. I am mainly interested in CUDA, but could be related to OpenCL as well.

Problem: In upstream there is common problem:

Detail: When I say compilation is executed for all available architectures I mean NVCC gets following switches:

-gencode arch=compute_20,code=sm_20 
-gencode arch=compute_30,code=sm_30 
-gencode arch=compute_35,code=sm_35 
-gencode arch=compute_37,code=sm_37 
-gencode arch=compute_50,code=sm_50 
-gencode arch=compute_52,code=sm_52 
-gencode arch=compute_60,code=sm_60 
-gencode arch=compute_61,code=sm_61

It is pretty much and prolong the compilation time by 3-5x, here is sample CMakeList plugin for configuration based on architecture name and failover to autodetect if not specified:

  if(CUDA_GENERATION STREQUAL "Fermi") 
    set(__cuda_arch_bin "2.0") 
  elseif(CUDA_GENERATION STREQUAL "Kepler") 
    set(__cuda_arch_bin "3.0 3.5 3.7") 
  elseif(CUDA_GENERATION STREQUAL "Maxwell") 
    set(__cuda_arch_bin "5.0 5.2") 
  elseif(CUDA_GENERATION STREQUAL "Pascal") 
    set(__cuda_arch_bin "6.0 6.1") 
  elseif(CUDA_GENERATION STREQUAL "Auto") 
    execute_process( COMMAND "${CUDA_NVCC_EXECUTABLE}" "${OpenCV_SOURCE_DIR}/cmake/checks/OpenCVDetectCudaArch.cu" "--run" 
                     WORKING_DIRECTORY "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/" 
                     RESULT_VARIABLE _nvcc_res OUTPUT_VARIABLE _nvcc_out 
                     ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE) 
    if(NOT _nvcc_res EQUAL 0) 
      message(STATUS "Automatic detection of CUDA generation failed. Going to build for all known architectures.") 
    else() 
      set(__cuda_arch_bin "${_nvcc_out}") 
      string(REPLACE "2.1" "2.1(2.0)" __cuda_arch_bin "${__cuda_arch_bin}") 
    endif()

Solution proposal: Gentoo in the moment has one USE_EXPAND variable called VIDEO_CARDS, but this variable supports mixture of VENDOR(intel,nvidia), ARCHITECTURE (radeon) and DRIVER (noveau vs nvidia) related values, so is not usable. It is not a problem and naturally this variable servers mainly for X11 drivers installation.

I came with some idea and discussed this on gentoo-dev IRC channel and would like to propose following USE_EXPAND new variable: GPU_ARCHITECTURE | GPU_TARGETS

I didn't use CUDA, because I think this could be used for both CUDA and OpenCL at the same time. Such variable must be defined in /etc/portage/make.conf

Possible values:

I am even fan of creating more complex configuration than just nvidia_maxwell, we can add other important switches.

To support this stuff it would be quite intelligent approach to create new Gentoo eclass to keep it sane :-).

I welcome and comments or help with this.

Thanks

NOTE: For reference I keep here original thread on forum: Gentoo forum thread as reference

Alessandro-Barbieri commented 7 years ago

Why not CUDA_TARGETS?

archenroot commented 7 years ago

Could be CUDA_TARGETS, but then it doesn't support OpenCL, so could be GPU_TARGETS, but in general question is what OpenCL packages requires from this perpsective, do they need some kind of architecture specificatiokn, or in case of OpenCL it doesn't metter?

2017-06-08 9:33 GMT+02:00 Alessandro Barbieri notifications@github.com:

Why not CUDA_TARGETS?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/archenroot/gentoo-overlay/issues/24#issuecomment-307023592, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhyKJ0ChtX-7bJ69layrqAcKnjUjEhnks5sB6OugaJpZM4Nzor- .

Alessandro-Barbieri commented 7 years ago

If I remember right, OPENCL uses just in time compilation. Regarding the architecture specification, I suppose gets determined at runtime.

archenroot commented 7 years ago

So it is different than CUDA from this perspective and it doesn't make sense to specify more details in pre-compile time for OpenCL.... interesting

2017-06-08 11:29 GMT+02:00 Alessandro Barbieri notifications@github.com:

If I remember right, OPENCL uses just in time compilation. Regarding the architecture specification, I suppose gets determined at runtime.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/archenroot/gentoo-overlay/issues/24#issuecomment-307050362, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhyKL06K40h6d0LSQp_JaQEHoP0Jivqks5sB77mgaJpZM4Nzor- .