Gentoo support for new USE_EXPAND=GPU_ARCHITECTURE

archenroot commented 7 years ago

There is common problem with hardware GPU acceleration support. I am mainly interested in CUDA, but could be related to OpenCL as well.

Problem: In upstream there is common problem:

there is only global CUDA support -> compilation happen for all available architectures
there is CUDA architecture option, but cannot be utilized from within Gentoo as there is no valid use flag (we have only cuda and opencl general use flags) -> compilation happen for all available architectures
there is CUDA autodetector in some upstreams (OpenCV), but they doesn't work -> compilation happen for all available architectures

Detail: When I say compilation is executed for all available architectures I mean NVCC gets following switches:

-gencode arch=compute_20,code=sm_20 
-gencode arch=compute_30,code=sm_30 
-gencode arch=compute_35,code=sm_35 
-gencode arch=compute_37,code=sm_37 
-gencode arch=compute_50,code=sm_50 
-gencode arch=compute_52,code=sm_52 
-gencode arch=compute_60,code=sm_60 
-gencode arch=compute_61,code=sm_61

It is pretty much and prolong the compilation time by 3-5x, here is sample CMakeList plugin for configuration based on architecture name and failover to autodetect if not specified:

  if(CUDA_GENERATION STREQUAL "Fermi") 
    set(__cuda_arch_bin "2.0") 
  elseif(CUDA_GENERATION STREQUAL "Kepler") 
    set(__cuda_arch_bin "3.0 3.5 3.7") 
  elseif(CUDA_GENERATION STREQUAL "Maxwell") 
    set(__cuda_arch_bin "5.0 5.2") 
  elseif(CUDA_GENERATION STREQUAL "Pascal") 
    set(__cuda_arch_bin "6.0 6.1") 
  elseif(CUDA_GENERATION STREQUAL "Auto") 
    execute_process( COMMAND "${CUDA_NVCC_EXECUTABLE}" "${OpenCV_SOURCE_DIR}/cmake/checks/OpenCVDetectCudaArch.cu" "--run" 
                     WORKING_DIRECTORY "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeTmp/" 
                     RESULT_VARIABLE _nvcc_res OUTPUT_VARIABLE _nvcc_out 
                     ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE) 
    if(NOT _nvcc_res EQUAL 0) 
      message(STATUS "Automatic detection of CUDA generation failed. Going to build for all known architectures.") 
    else() 
      set(__cuda_arch_bin "${_nvcc_out}") 
      string(REPLACE "2.1" "2.1(2.0)" __cuda_arch_bin "${__cuda_arch_bin}") 
    endif()

Solution proposal: Gentoo in the moment has one USE_EXPAND variable called VIDEO_CARDS, but this variable supports mixture of VENDOR(intel,nvidia), ARCHITECTURE (radeon) and DRIVER (noveau vs nvidia) related values, so is not usable. It is not a problem and naturally this variable servers mainly for X11 drivers installation.

I came with some idea and discussed this on gentoo-dev IRC channel and would like to propose following USE_EXPAND new variable: GPU_ARCHITECTURE | GPU_TARGETS

I didn't use CUDA, because I think this could be used for both CUDA and OpenCL at the same time. Such variable must be defined in /etc/portage/make.conf

Possible values:

CUDA: cuda_fermi, cuda_volta, cuda_maxwell, etc.
OpenCL: ? - is it even valid and useful?

I am even fan of creating more complex configuration than just nvidia_maxwell, we can add other important switches.

To support this stuff it would be quite intelligent approach to create new Gentoo eclass to keep it sane :-).

I welcome and comments or help with this.

Thanks

NOTE: For reference I keep here original thread on forum: Gentoo forum thread as reference

Alessandro-Barbieri commented 7 years ago

Why not CUDA_TARGETS?

archenroot commented 7 years ago

Could be CUDA_TARGETS, but then it doesn't support OpenCL, so could be GPU_TARGETS, but in general question is what OpenCL packages requires from this perpsective, do they need some kind of architecture specificatiokn, or in case of OpenCL it doesn't metter?

2017-06-08 9:33 GMT+02:00 Alessandro Barbieri notifications@github.com:

Why not CUDA_TARGETS?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/archenroot/gentoo-overlay/issues/24#issuecomment-307023592, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhyKJ0ChtX-7bJ69layrqAcKnjUjEhnks5sB6OugaJpZM4Nzor- .

Alessandro-Barbieri commented 7 years ago

If I remember right, OPENCL uses just in time compilation. Regarding the architecture specification, I suppose gets determined at runtime.

archenroot commented 7 years ago

So it is different than CUDA from this perspective and it doesn't make sense to specify more details in pre-compile time for OpenCL.... interesting

2017-06-08 11:29 GMT+02:00 Alessandro Barbieri notifications@github.com:

If I remember right, OPENCL uses just in time compilation. Regarding the architecture specification, I suppose gets determined at runtime.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/archenroot/gentoo-overlay/issues/24#issuecomment-307050362, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhyKL06K40h6d0LSQp_JaQEHoP0Jivqks5sB77mgaJpZM4Nzor- .

archenroot / gentoo-overlay

Gentoo support for new USE_EXPAND=GPU_ARCHITECTURE #24