GPU Support? - Githubissues

alexbw commented 8 years ago

I've got a couple packages I'm preparing for upload that rely on GPUs. I'm not up to speed on what open-source CI solutions offer, but would building against a VM w/ GPUs be supported? If it would require pitching in or donating to the project, I'm pretty sure I can figure some way to help.

pelson commented 8 years ago

I'm completely out of my depth on this one. @msarahan any knowledge on the subject?

jakirkham commented 8 years ago

I'm also interested in this. The trick is most CIs do not provide GPUs. However, if you are willing to work with OpenCL, which works with CPUs and GPUs, then we can work together on getting that support on CIs.

alexbw commented 8 years ago

Unfortunately, CUDA really is the de-facto standard for machine learning. I think NVidia is generally interested in helping out the open-source community, so it might be worth starting a conversation with them about helping out. I'll test out conda-forge with non-GPU packages, and if things seem to work smoothly, then can start talking with them.

Question -- is conda and conda-build updated regularly on this system? My packages are in Lua, and support for them was only added recently.

On Sun, Mar 27, 2016 at 6:56 PM jakirkham notifications@github.com wrote:

I'm also interested in this. The trick is most CIs do not provide GPUs. However, if you are willing to work with OpenCL, which works with CPUs and GPUs, then we can work together on getting that support on CIs.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/conda-forge/conda-forge.github.io/issues/63#issuecomment-202103492

jakirkham commented 8 years ago

I thought you might say that. Unfortunately, without the CI support, we are kind of in a bind on this one. If NVIDIA is willing to work on CI support with GPUs, that would be great.

To be completely honest with you, I don't think we should be holding our breath on this. The real problem is that most CI services are leasing time on other infrastructure. Primarily Google and Amazon's infrastructure. Unless someone has the infrastructure with GPUs that they are willing to lease to some CI service for this purpose, we are kind of stuck. I think we can all imagine what they would prefer to do with this infrastructure, right? However, if you figure out something on this, please let us know and we can work on something.

I'm guessing you are using Torch then? At a bare minimum, let's work on getting Torch's dependencies in here. At least, that will make your job a little easier, right? For that matter any run-of-the-mill Lua packages that you have would be good to try to get in, as well. It should help you and others looking for more Lua support in conda. How does that sound?

jakirkham commented 8 years ago

This repo seems to use NVIDA components for their CI.

jakirkham commented 8 years ago

Did you see the link above, @alexbw?

This might not be totally impossible after all, but I think we should do some research on how this works. What platforms are you trying to support? Just Linux? Mac also? I'm totally out of my depth on Windows. So we may need someone else to show us the ropes there.

alexbw commented 8 years ago

Saw the link. Looking more into this, but on the timescale of a few weeks.

On Sun, Mar 27, 2016 at 6:56 PM jakirkham notifications@github.com wrote:

I'm also interested in this. The trick is most CIs do not provide GPUs. However, if you are willing to work with OpenCL, which works with CPUs and GPUs, then we can work together on getting that support on CIs.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/conda-forge/conda-forge.github.io/issues/63#issuecomment-202103492

jakirkham commented 8 years ago

So, I looked a little bit more closely at this and it looks like one could add GPU libs to CentOS 6 (what we are currently using). There is Mac and Windows support too, but IMHO this is secondary to getting Linux up and working. However, I am not seeing any support for CentOS 5 (a platform we were debating switching too), which is something to keep in mind.

msarahan commented 8 years ago

Good to know. We are collecting datapoints on whether continuing with CentOS 5 is a good idea. If anyone knows of definitive reasons to stay with CentOS5, it is currently preventing:

Qt5
complete LLVM
now, GPU libs

jakirkham commented 8 years ago

Glad you saw this, @msarahan. Was debating cross referencing, but didn't want to have a mess of links. Are there still many customers using a CentOS 5 equivalent Linux? Could we maybe build the compiler on CentOS 5 and somehow add it to CentOS 6?

msarahan commented 8 years ago

Building the compiler on the older architecture doesn't help. What matters is the GLibc version present on the build system when packages are built.

We don't have hard data on customers, outside of HTTP headers for downloads of packages. We're digging through that to see how many people have kernels older than the one corresponding to CentOS 6.

jakirkham commented 8 years ago

Right. I was just hoping there was a way we could somehow have both. I guess in the worst case some things could be CentOS 6 as needed. Will that have any consequences if we mix the two? Is that already done to some extent (noting that Qt5 was mentioned).

Yeah, it seems like it would be good to give a survey. Might need an incentive to make sure it actually gets filled out.

jakirkham commented 8 years ago

Also, interesting footnote (though I would appreciate if other people check and make sure I am reading this right as IANAL), it appears that at least some of the CUDA libraries can be shipped. This means we could create a CUDA package that simply makes sure CUDA is installed in the CI and moves them into the conda build prefix for packaging. The resulting package could then be added as a dependency of anything that requires them (e.g. Torch, Caffe, etc.). This would avoid us having to add these hacks in multiple places and risk losing them when we re-render. Furthermore, we would be able to guarantee that the libraries we used to build would be installed on the user's system.

jakirkham commented 8 years ago

We should verify whether one version of CUDA for one Linux distro and version can be used on other Linux distros and versions easily or if we need to have multiple flavors. This survey will have to be extended to other OSes at some point, but starting with Linux makes the most sense to me.

jakirkham commented 8 years ago

So, I am trying something out in this PR ( https://github.com/conda-forge/caffe-feedstock/pull/1 ). In it, I am installing CUDA libraries into the docker container and attempting to have Caffe build against them. It is possible some tests will fail if we don't have access to an NVIDA GPU so we will have to play with that. Also, we don't have cuDNN as that appears to require some registration process that I have not looked into yet and may be a pain to download in batch mode.

In the long run, I expect the CUDA libraries will be wrapped in their own package for installation and packages needing them will simply install these libraries. We may need to features to differentiate different GPU variants (CUDA/OpenCL). However, that CUDA package will probably need to hack the CI script in a similar way.

Definitely am interested in feedback. So, please feel free to share.

jakirkham commented 8 years ago

Another thought might be that we don't ship the CUDA libraries. Instead we have a package that merely checks to set that they are installed via a pre or post-link step. If it fails to find them, the install fails. This would avoid figuring out where the libraries can or cannot be distributed safely. Hopefully, as we are linking against the CUDA API. All that will matter is that we have an acceptable version of the CUDA libraries to run regardless of what Linux distribution it was initially built on.

jakirkham commented 8 years ago

Appears Circle CI does provide GPU support or at least that is what my testing suggests.

jakirkham commented 8 years ago

Also, as FYI, in case you didn't already know @msarahan, CentOS 5 maintenance support ends March 2017. In other words, less than a year. That sounds like a pretty big negative to me. Given how many recipes have been added from conda-recipes and how many remain to be added at this point, trying to add a CentOS 5 switch before that point sounds challenging. Not to mention, we may find ourselves needing to migrate back to CentOS 6 by that point. Maybe it is just me, but I'm starting to feel a lot of friction in switching to CentOS 5. Is it reasonable that we consider just accepting CentOS 6 as part of this transition?

kyleabeauchamp commented 8 years ago

FWIW, we have GPU support on Omnia. Might be worth reading over.

https://github.com/omnia-md/conda-recipes

https://github.com/omnia-md/omnia-build-box

jakirkham commented 8 years ago

Thanks for these links @kyleabeauchamp. I'll certainly try to brush up on this.

Do you have any thoughts on this PR ( https://github.com/conda-forge/caffe-feedstock/pull/1 )? Also, how do you guys handle the GPU lib dependency? Is that packaged somehow, used from the system (possibly with some sort of check), some other way?

kyleabeauchamp commented 8 years ago

So AFAIK our main use of GPUs was building a simulation engine OpenMM (openmm.org). OpenMM is a C++ library and can dynamically detect the presence of CUDA support (via shared libraries) at runtime. This means that we did not package and ship anything related to CUDA. We basically just needed CUDA on the build box to build the conda package, then let OpenMM handle things later dynamically.

kyleabeauchamp commented 8 years ago

Looks like our dockerfile is somewhat similar to your CUDA handling:

https://github.com/omnia-md/omnia-build-box/blob/master/Dockerfile#L25

jakirkham commented 8 years ago

Ah, ok, thanks for clarifying.

The trick with Caffe, in particular, is that it can use CPU, CUDA, or OpenCL. CPU support is always present; however, a BLAS is required, which includes a user's CPU choice (OpenBLAS, ATLAS, MKL, or possibly some hack to add other options) and a GPU choice (if any) cuBLAS or ViennaCL. Thus, having this dynamically determined ends up not really being as nice as it could be. To allow actual selection will require feature support and possibly multiple rebuilds of Caffe.

One simple route might be to just always use ViennaCL, which can abstract the difference between the OpenCL and CUDA options. Also, it can always fallback to the CPU if no GPU support is present. Though I expect this layer of abstraction comes at some penalty, the question is how severe is that penalty. Would a solution like this work with OpenMM? I don't know if its GPU support proceeds primarily through a GPU BLAS or some other mechanisms. For instance, is it using FFTs?

If you have deep learning interests, this may be relevant. In the case of Caffe, it can optionally support cuDNN. Researchers will want this support out of the box. Not only is this tricky because it may not be provided for due to hardware or software reasons, it is tricky because downloading cuDNN requires a registration step with unclear licensing restrictions. One way we might resolve this is to request cuDNN be loaded on an appropriate Docker container. NVIDA does do this with Ubuntu 14. However, I don't see a similar container for CentOS 6 and am unclear on whether it would be a supported platform. Ultimately, it will require us to communicate with NVIDA at some point to see what we need to do here to stay above board while providing users state of the art support.

Fortunately, NVIDA is very clear as to what parts of the CUDA libraries can be distributed down to file level of what can and cannot be distributed. So, the concerns with cuDNN do not affect this.

jakirkham commented 8 years ago

Another thought for more versatile support would be to use the clMath libraries.

kyleabeauchamp commented 8 years ago

OpenMM dynamically chooses the best platform at runtime, with options including CPU (SSE), CUDA, OpenCL, and CPU (no SSE / reference). It does use FFTs. The idea with OpenMM is to build and ship binaries that support all possible platforms, then select at runtime.

Lnaden commented 7 years ago

Continuing the discussion from https://github.com/conda-forge/staged-recipes/issues/299#issuecomment-297404016, the problem we have is that OpenMM requires the CUDA toolkit libraries and AMD SDK to build its GPU features against. Moving OpenMM over to conda-forge is a large barrier to moving the remainder of omnia's packages over.

As @kyleabeauchamp mentioned on here, OpenMM does not actually ship CUDA or the AMD SDK, it only uses the libraries to build its own GPU components to ship out and accesses user's local GPU software. OpenMM requires CUDA Toolkit 8.0 and AMD SDK 3.0 for its current builds.

As discussed by @jakirkham and @jjhelmus on the linked issue, there are license problems if we tried to ship the toolkits with OpenMM, but I'm not sure that would be required.

Tagging @peastman, @jchodera xref: https://github.com/conda-forge/staged-recipes/issues/299#issuecomment-297404016 xref: https://github.com/conda-forge/staged-recipes/issues/299#issuecomment-297407579 xref: https://github.com/conda-forge/staged-recipes/issues/299#issuecomment-297409444

scopatz commented 6 years ago

Fortunately, NVIDA is very clear as to what parts of the CUDA libraries can be distributed down to file level of what can and cannot be distributed

@jakirkham - what do you mean by this? Is there a file listing somewhere?

jjhelmus commented 6 years ago

NVIDIA lists the files which are redistributable under their license are given in Attachment A of the EULA. The document moves around with new releases of CUDA but currently is available here.

isuruf commented 6 years ago

You still have the line

You agree to defend, indemnify and hold harmless NVIDIA and its Affiliates, and their respective employees, contractors, agents, officers and directors, from and against any and all claims, damages, obligations, losses, liabilities, costs or debt, fines, restitutions and expenses (including but not limited to attorney’s fees and costs incident to establishing the right of indemnification) arising out of or related to you and your Enterprise, and their respective employees, contractors, agents, distributors, resellers, end users, officers and directors use of Licensed Software outside of the scope of the AGREEMENT or any other breach of the terms of the AGREEMENT.

in that link which was the exact same reason we don't ship MKL and Intel runtime libraries.

scopatz commented 6 years ago

@isuruf - I am not certain I understand the concern here. Isn't this just saying that we won't sue nvidia?

isuruf commented 6 years ago

I think it's saying we will defend and indemnify NVIDIA if a conda-forge user sues them. IANAL though

scopatz commented 6 years ago

Also, the sentence before that is specifically about share-alike linking, "you agree that you will not (nor authorize third parties to, [...] use the Licensed Software in any manner that would cause the Licensed Software to become subject to an Open Source License. Nothing in the AGREEMENT shall be construed to give you a right to use, or otherwise obtain access to, any source code from which the Software or any portion thereof is compiled or interpreted."

So this seems to be about if someone tries to GPL their code.

Since the redistributable white-list only includes binaries and headers (not source code), I am not sure how we could be at fault because there is no oppurtunity for a cudatoolkit package to include source in the first place.

scopatz commented 6 years ago

I thought MKL's license didn't even allow you to redistribute binaries.

scopatz commented 6 years ago

Thanks for the whitelist @jjhelmus!

djsutherland commented 6 years ago

Since the redistributable white-list only includes binaries and headers (not source code), I am not sure how we could be at fault because there is no oppurtunity for a cudatoolkit package to include source in the first place.

I think it's somewhat controversial whether distributing GPL software linked to non-free binaries is okay; that's probably what they mean.

scopatz commented 6 years ago

Yeah, @dougalsutherland - that's fair. I sent oss-requests@nvidia.com an email asking for clairifacations. Worst case I could request the source code and try to compile it.

jakirkham commented 6 years ago

I thought MKL's license didn't even allow you to redistribute binaries.

One is allowed to redistribute MKL as long as their license is honored.

ref: https://software.intel.com/en-us/license/intel-simplified-software-license

jchodera commented 6 years ago

Is the concern here (1) whether hosting linux-anvil derivatives that contain install CUDA toolkits to allow the automated building of packages that link against CUDA or OpenCL is permissible, (2) whether distribution of conda packages compiled against CUDA libraries is permissible, or (3) whether the CUDA toolkit could itself be redistributed as a conda-forge package?

scopatz commented 6 years ago

For me, the concern is (3)

jakirkham commented 6 years ago

Personally would say 2+3 are concerns. 3 less so as we could just use the package from defaults.

djsutherland commented 6 years ago

Yeah, Anaconda has special permission (IIRC) from Nvidia to redistribute cudatoolkit on defaults and we can just rely on that. I think (1) and (2) are both potential concerns, along with the practical issue of not having GPUs on the CI services to test packages.

jchodera commented 6 years ago

Yeah, Anaconda has special permission (IIRC) from Nvidia to redistribute cudatoolkit on defaults and we can just rely on that.

It looks like those packages are not kept up to date. Both CUDA 9.1 and 9.2 have been released since (along with some patches), but those do not seem to be available there.

jakirkham commented 6 years ago

Please raise with Anaconda. Thanks:)

jchodera commented 6 years ago

cc: https://github.com/ContinuumIO/anaconda-recipes/issues/140

scopatz commented 6 years ago

Yeah, Anaconda has special permission (IIRC) from Nvidia to redistribute cudatoolkit on defaults

So I don't think that is true, per se. As @jakirkham, @jjhelmus, and others have brought up, parts of cudatoolkit are freely redistributable according to the EULA. These freely redistributable portions are all that the defaults package contains.

and we can just rely on that

As @jchodera points out, these packages are not kept up-to-date. To that end, there is now PR to add a similar recipe to what's on defaults (freely redistributable) into staged recipes at conda-forge/staged-recipes#6240.

I think it's somewhat controversial whether distributing GPL software linked to non-free binaries is okay;

This seems like it would only be an issue for any other packages (ie not cudatoolkit) that happen to be GPL and happen to link to CUDA. This should not affect the distrubtion of cudatoolkit itself.

isuruf commented 6 years ago

Also, the sentence before that is specifically about share-alike linking

So this seems to be about if someone tries to GPL their code.

I think the sentence I mentioned is separate from the whole you agree that you will not (nor authorize third parties to): part.

scopatz commented 6 years ago

@isuruf - so the sentence you mentioned (You agree to defend, indemnify ...) is in the same clause (1.2.1) as the you agree that you will not: .... So I think it is fair to read them as related propositions.

Additionally, we offer conda-forge under BSD-3-Clause. This means that all of conda-forge's repos, packages, etc are provided by us without any "express or implied warranties." So while we may be obligated to defend, indemnify, ... NVIDIA from our users, our users have no grounds to make a claim against us or NVIDIA. Our BSD license serves as protection for NVIDIA.

xmnlab commented 6 years ago

should we try to invite some person from nvidia to help us in this issue?

jakirkham commented 6 years ago

I sent oss-requests@nvidia.com an email asking for clairifacations. Worst case I could request the source code and try to compile it.

@scopatz, did you ever hear back?

scopatz commented 6 years ago

@jakirkham - sadly not yet!

conda-forge / conda-forge.github.io

GPU Support? #63