media-gfx/blender: add compile cuda binaries to ebuild

redchillipadi commented 8 years ago

Add an option called build-cuda-kernel which passes WITH_CUDA_BINARIES to cmake. Ensure that it is possible to specify cuda_kernel_version_sm20, cuda_kernel_version_sm52 etc eg by writing CUDA_KERNEL_VERSION="sm20" in make.conf

It is important to be able to specify the kernel version to prevent the build system becoming unresponsive while trying to build multiple kernels simultaneously with a large memory and swap requirement, and instead just build the minimal set required.

This will likely only be used by distro creators to build a tarball of a system with precompiled kernels. Most users will not set this and use the default behaviour to recompile the cuda kernel the first render with a new blender version.

dracwyrm commented 8 years ago

This is the relevant part of the CMake file and the kernels there are to work with:

option(WITH_CYCLES_CUDA_BINARIES    "Build Cycles CUDA binaries" OFF)
set(CYCLES_CUDA_BINARIES_ARCH sm_20 sm_21 sm_30 sm_35 sm_37 sm_50 sm_52 CACHE STRING "CUDA architectures to build binaries for")

Shouldn't be too hard to parse through. :)

Something like this in the ebuild

local some-temp-var=""
for cuda-card in list-of-cuda-cards; do some-temp-var+=cuda-card......
-Dsome-temp-var

In CMake file: set(CYCLES_CUDA_BINARIES_ARCH some-temp-var CACHE......

I would need to get permission to add the USE_EXPAND variable to the master list.

But first, I want to know what the heck those kernels do! Is this even worth it? Can you select them in Blender and it gives you better Cycles render speed?

redchillipadi commented 8 years ago

The Cuda kernels are the compiled Cycles code which is run on the GPU to do the rendering. Thus a cuda kernel is required for all GPU rendering.

There are different versions of the GPU architecture. They appear to be backwards compatible as my card supports sm30, but runs fine with sm20 code we select in the fix-gpu-architecture patch in opensubdiv. I presume that later versions of the gpu architecture support additional features and are more efficient in terms of rendering but have no data to support this.

Blender does not seem to allow selection of the architecture to use as far as I can see. It just compiles the version it wants to use if it does not already exist. For a single machine user, this means that they will need to wait 15 minutes on their first render following a blender version bump.

Having WITH_CYCLES_CUDA_BINARIES and the ability to set the CYCLES_CUDA_BINARY_ARCH would allow this ~15 minutes of compilation to occur when blender is compiled. I am imagining a render farm with many machines with the same GPU. It would be an advantage to be able to compile the kernel with blender when creating the system, and then put it on every machine. For home users this would be a disadvantage as anytime they updated the use flags the recompiling the same kernel would waste another 15 minutes, and they do not have a second machine which would allow the time and power savings of avoiding recompiling.

The same CYCLES_CUDA_BINARY_ARCH could also be used to improve the opensubdiv- fix-gpu-architecture patch which currently hard codes the architecture to sm20, when some users may benefit from using later architectures.

I think it is worth requesting the USE_EXPAND variable using a generic name like CUDA_KERNEL_VERSION or GPU_ARCHITECTURE as several programs, such as blender and opensubdiv may benefit from the user selection of the kernel. It also makes it more friendly for cross compiling. However I see the number of users who will actually want to use the WITH_CYCLES_CUDA_BINARIES feature in blender being a very small subset.

dracwyrm commented 8 years ago

It takes 15 minutes per kernel? Or is it total for all kernels? If only one is selected, then it should compile a lot faster.

I think if a person has nVidia and the ability to use Cuda, they probably should. If WITH_CYCLES_CUDA_BINARIES set to off, does it still compiles a binary at runtime?

Also, if a person has the binary compiled at runtime, it's not tracked by portage, thus if a use uninstalls a Blender, then that binary will remain. This is why autodetection must be disabled even after compiling. What happens when a user updates Blender to a completely new version, will the binary recompile?

For now, should we just patch that variable to the lowest setting to compile only one binary, and then figure out how to stop it from compiling at runtime? Then the cuda use flag will truly turn off cuda completely.

This is a tough one. :)

redchillipadi commented 8 years ago

Last night when it was compiling all the kernels, it made my system unresponsive for 30 minutes (due to swapping as it requires around 25 GB of memory). I believe it compiled all seven kernels in this time. The total compile time for blender including the kernels and documentation was around 50 minutes.

If WITH_CYCLES_CUDA_BINARIES is set to off, it will still compile a binary if one is missing at runtime. For example, when I first installed blender 2.77a and rendered the default cube, blender needs to recompile the cuda kernel. So it takes 15 minutes for the render to appear while it is compiling this one kernel.

Each card will have one optimal version of cuda that it supports. The list is at https://developer.nvidia.com/cuda-gpus. There is no need to have additional kernels compiled (unless you are a distro creator needing to support several different nvidia cards within your render farm)

If I emerge --unmerge blender and then emerge blender, the binary is left behind so it does not need to be recompiled. I feel that keeping the binary around is useful as I do not want to have to wait another 15 minutes for my render each time I change a use flag during testing. Besides it is not compiled by our script, it is created by the nvidia-cuda-toolkit that blender runs if the user requests a gpu render. Can you point me to the reference on the gentoo policy on autodetection?

I think that if the user doesn't want to use cuda, then they will just set their User Preferences to use CPU rendering. If they do enable GPU rendering, then they want to use the cuda kernel and should let blender compile the most suitable version available for their system, or use an available one if present. If they are a distro creator or run a renderfarm then we should try to give them the option to select which kernel they want precompiled, but it should still be up to the renderfarm user whether they want to compile using CPU or GPU. And if they installed the wrong kernel on their system then we should still let blender recompile the correct one for them.

I keep having additional thoughts about the name for the global use flag. Nvidia refer to the version supported as Compute Capability. So perhaps this is a better name than CYCLES_CUDA_VERSION or GPU_ARCHITECTURE, especially when we consider that the flag may be useful for patching opensubdiv and other programs, and it does not refer to amd cards so perhaps GPU_ARCHITECTURE is too general.

Also, I believe that opencl is used to compile either a monolithic or modular kernel for amd cards. So we may need to set some options for your card also. See https://wiki.blender.org/index.php/Dev:Source/Render/Cycles/OpenCL. How does gpu rendering work on your system?

redchillipadi commented 8 years ago

The compiled kernel is kept at /usr/share/blender/2.77/scripts/addons/cycles/kernel/kernels/cuda/kernel.cu

redchillipadi commented 8 years ago

On Wed, 15 Jun 2016 06:50:28 PM you wrote:

The compiled kernel is kept at /usr/share/blender/2.77/scripts/addons/cycles/kernel/kernels/cuda/kernel.cu Sorry, I read that the wrong way around. It uses the source in the above file to compile the kernel which it stores as /home/adrian/.config/blender/2.77/cache/cycles_kernel_sm30_D52676E3623A29D029BCCDFB7AF5DED6.cubin

dracwyrm commented 8 years ago

About automagic, third paragraph: https://devmanual.gentoo.org/general-concepts/use-flags/ There is a work around (there's always a work around :-)) If it's a use flag change or rX update that doesn't affect the CUDA module (a patch to fix a bug in it would affect it), we can copy the one that was already built to the temp install dir and disable the building of it in the ebuild. Then it is re-installed by the install process so portage tracks it. There's commands to detect if it's a reinstall with use changes or a rev bump. My main concern is that on startup, users would think that Blender has stalled. Since it's compiling at compile time, it's more expected. And we can keep the binary on use flag changes and rX updates (depending), so that would be the same as compile at runtime. Though, actual version updates would need a new binary. This would also make sure the old binary is removed and replaced. Though, we may need a warning for users to make sure the old binary is removed from their home dir. The one from the ebuild may install globally for all users without having to compile it for their home dir. Just a guess.

I don't have OpenCL installed as I didn't see a WITHOPENCL option in the make file. Maybe we need to add that as a Use Flag? This may explain why I don't have GPU rendering at all. <<

I did a grep of all the files grep -H -r "OPENCL" . and there is apparently a WITH_OPENCL option in one of the sub files, but I need to study what it does. I will try it with the flag set and pull in OpenCL to see what happens. This will take a bit to study.

Though, per rules, we need to see if there is a way to make sure cuda and opencl are disabled if the use flags are unset, so there is no compiling at runtime.

redchillipadi commented 8 years ago

About automagic, third paragraph: https://devmanual.gentoo.org/general-concepts/use-flags/ Thanks for the reference. I want to make sure that my understanding is correct and that we are discussing the same thing.

I had interpreted that paragraph as meaning that it is not permissible for cmake to allow the configure stage to determine which dependencies to link based upon searching for available installed packages.

So things like autotools doing dependency discovery with pkgconfig or CMakeLists.txt doing things like

if (FindPACKAGE) then set(BUILD_PACKAGE TRUE) else set(BUILD_PACKAGE FALSE) endif

are prohibited as the required packages are not listed as a dependency in the ebuild. It ensures that all dynamically linked packages are included as depedencies in the ebuild. This rule also helps with cross compilation as packages need to build what is desired for the target system, not link the packages present at compile time on the host.

However I think that blender's use of nvidia-cuda-toolkit is different. It is not a compile or link time dependency, but rather using a system package that may be present at runtime.

This is more like dev-util/codeblock, which compiles and links IDE, but then when running allows the user to select from available compilers on the system eg gcc/clang and uses them via a system call. None of these compilers are listed as dependencies in the ebuild or dynamically linked to the software.

Fileroller is another example which uses many archive applications without dependency in the ebuild on any of them. None of them are required for compiling or linking. They are also not dynamically linked to the package so the package can run and display the GUI without any of them present. Obviously unzipping a file will not work unless app-arch/zip is present on the system.

Is this the type of behaviour acceptable or is it prohibited by the autodetect rule? I think they key question is whether the use of a package as a system call means that it need to be an RDEPEND?

Similarly, blender is compiled and linked without any direct dependency upon nvidia toolkit, but if the user requests gpu comilation then it makes a system call to run nvcc.

I hope I am explaining myself clearly. Please let me know if I can clarify something or if my understanding is inaccurate.

As far as the user thinking blender may have stalled, the compilation does not occur when blender is started by the user, so the initial loading and use of blender is not affected. It is only when the user starts to render that the kernel is compiled. The user interface displays the message

Frame 1 | Time 00:00:00 | Mem 0.00M, Peak:0.00M | Scene, RenderLayer | Loading render kernels (may take a few minutes the first time)

and the console window shows Compiling CUDA kernel ... "nvcc" -arch=sm_30 -m64 --cubin "/usr/share/blender/2.77/scripts/addons/cycles/kernel/kernels/cuda/kernel.cu" -o "/home/adrian/.config/blender/2.77/cache/cycles_kernel_sm30_311C6CB52A88568C18C3DA3897396C20.cubin" --ptxas-options="-v" --use_fast_math - I"/usr/share/blender/2.77/scripts/addons/cycles/kernel" -DNVCC - DKERNEL_CUDA_VERSION=75

This is the same behaviour that all the previous ebuilds of blender had, and also how blender works under Windows.

As you say, I think that if do need a cuda flag, then we need to make sure that cuda is enabled if and only if it is set.

I am starting another branch with my work so far on allowing conditional compilation of cuda kernels while emerging blender. Currently it does not detect the CUDA_KERNEL_VERSION I put into make.conf or /usr/portage/build/make.defaults for testing, and I am clobbering the mycmakeargs somehow so the ebuild fails.

Once it is complete we can get permission for the USE_EXPAND flag and merge it into master.

dracwyrm commented 8 years ago

I think the main priority is stopping CUDA if the user has the flag disabled. Though, I have no idea why a user would want to disable that if they have nVidia.... But that is how Gentoo is supposed to behave.

Does not setting WITH_CUDA=OFF during configure not keep that setting later during runtime? WITH_OPENCL is in the actual source code of Blender. It's just hard to see if it actually stays off.

I think the cuda and opencl use flags are handy as most users don't read descriptions. They may not know that Blender can link against either of those. I didn't know it could use OpenCL for AMD enhanced rendering because it was hidden away not in the main Make file. It's a big difference. I was contemplating forking over money for an nVidia card just for CUDA enhancements.

The issue of if CUDA kernel compiling is not as important as Blender was always the same before. ;) They didn't know it then and they may not know it now. :P We should see if there are issues in the future like on version upgrades and if CUDA kernel will recompile. It's something for us to keep an eye out for.

OpenSubdiv could use the fix, but it's not optimised as it will just compile one kernel for the lowest one.

If and only if, we need to fix the kernel compile at runtime, and the USE_EXPAND is taking a while to be debated on being added to the tree, then I think we could go the OpenSubdiv route and just temporarily patch it to use the lowest CUDA kernel, just to have something. We can always rev bump or wait for the next version to roll out custom CUDA kernels at build time.

redchillipadi commented 8 years ago

I have added a cuda and opencl flag which will allow the ebuild to enable or disable the cuda or opencl devices from being created. The user can not even select GPU Compute if they are disabled so there will be no runtime compiling of the cuda kernel either unless cuda is enabled. Runtime compiling is still possible if cuda is enabled.

I think this does what you want above. I have tested this works as above with my nvidia card, but am not able to test the opencl part. Could you let me know if you find any issues? If not I will update the wiki page.

So now the remaining question is whether we want to allow the users to be able to compile the kernel at compile time (ie. the functionality being developed in the cudakernel branch). If we do include it, we must allow the user to select the chosen kernels rather than patch it to the lowest supported version. They may need several versions if they have multiple systems or multiple cards. If we don't compile the required version then blender will recompile the highest supported version on the first render at runtime so we all we have done is wasted their time during the ebuild compilation.

If the USE_EXPAND takes some time to approve then we could still include cycles_kernel_version_sm_20, cycles_kernel_version_sm_30 etc as additional use flags to allow directly setting them from package.use.

dracwyrm commented 8 years ago

Very good! Completely disabling the building of kernels is Gentoo compliant. They may not like the kernel compiling during runtime rather than compiling in the compile phase of the ebuild. The number one reason being that the compiled binary is not not tracked by portage and users may not know where the kernel is to manually remove it if they want to remove blender from their system and not reinstall. I can see their point of view on that.

Right now, I'll add a ewarn message in the post remove section (the section that gets called on a emerge -C to remove it completely) with a message about where the kernel is and to remove that. I think this will be good for now.

Very valid point on multiple kernels. I didn't think about that they may wish to build more than one kernel, I was just thinking local install, but some users do use the package option to move compiled binaries across systems, so build once on one machine with the needed kernels and it's ready for all systems.

Do you think we should try to get a version that doesn't have the selection into the tree first? Then do a rev bump when the USE EXPAND is approved and fix those two packages? Meanwhile, we'll work on it in a separate branch like you have, so then it's very clean and works well. I consider this a whole new feature. And one that will need to be in the Wiki with a link to the page that has a list of what nVidia cards use what kernel.

PS. Amynka is waiting for you. :P

dracwyrm / gentoo-ebuilds

media-gfx/blender: add compile cuda binaries to ebuild #13