Support for Apple GPU builds

h-vetinari commented 3 years ago

So far, apple has not been mentioned at all in #63, so I thought this deserves a new issue.

I'm also aware that the CI situation for apple is not great (especially for osx-arm), and that apple+GPU is even worse, but I'm thinking that in an ideal world, it should still be possible eventually.

Apple has some builds in a conda channel, and here are their install instructions for the final product.

This leads to really contorted installation instructions in the wild (e.g. here), and consequently broken environments: https://github.com/conda-forge/tensorflow-feedstock/issues/154

BastianZim commented 3 years ago

Just to add, Apple recommends Miniforge3 and not miniconda so I guess they're aware of conda-forge. Maybe this is something we can use to our advantage?

isuruf commented 3 years ago

I don't think there's anything we have to do. If the package supports GPU builds, they can use the metal compiler to compile the GPU code.

h-vetinari commented 3 years ago

I don't think there's anything we have to do. If the package supports GPU builds, they can use the metal compiler to compile the GPU code.

Not sure I understand - "they" is the users? Why shouldn't the target be that (like everything else) that conda-forge has pre-compiled everything carefully?

I get that this would mean that there'd have to be a cf-compatible metal compiler (and probably other bits), but that could perhaps be a long-term goal (similar to how NVIDIA is now using conda-forge)?

isuruf commented 3 years ago

Not sure I understand - "they" is the users?

No, the maintainer of the package.

I get that this would mean that there'd have to be a cf-compatible metal compiler

Metal compiler already is. xcode compilers are compatible with c-f compilers.

h-vetinari commented 3 years ago

No, the maintainer of the package.

Metal compiler already is. xcode compilers are compatible with c-f compilers.

Sounds very good. Do we have any package in CF already that does GPU builds on macos?

isuruf commented 3 years ago

Nope. Metal is very macOS specific and I haven't seen packages supporting metal except games.

h-vetinari commented 3 years ago

Is it fair to say that we're in a situation where we believe it to be possible, but no-one has tried (and worked out the kinks) yet?

isuruf commented 3 years ago

I don't understand what you are trying to do here. Have you seen any package with Metal support that we package without Metal?

isuruf commented 3 years ago

FYI, Apple's tensorflow-metal is closed source, so we can't build it from source.

h-vetinari commented 3 years ago

I don't understand what you are trying to do here. Have you seen any package with Metal support that we package without Metal?

It's very possible that I'm wrong on this, but my understanding is that we're not building any GPU-code paths on macos for packages like tensorflow (cf. what kicked off this issue in the OP), pytorch, cupy, arrow, faiss, etc.

However, having found out that apple apparently provides GPU-enabled builds with tensorflow-metal, I was wondering what obstacles there would be to doing the same in CF.

h-vetinari commented 3 years ago

FYI, Apple's tensorflow-metal is closed source, so we can't build it from source.

Yeah, I get that it's not just flipping a switch. I was wondering (aside from source availability) what we'd need to in terms of compilers, runtimes, etc. (and there are probably other license issues there as well). Still I think the collaboration of conda-forge with NVIDIA (also not exactly a paragon of FOSS in the past) would be an interesting template to follow long-term.

isuruf commented 3 years ago

As I said, tensorflow-metal is closed source and none of the other packages you mentioned support metal. If you want to package tensorflow-metal as is in conda-forge, go ahead. The license of it allows redistribution.

Unless you find a package that needs to be built from source that needs metal, this discussion is pointless.

isuruf commented 3 years ago

I was wondering (aside from source availability) what we'd need to in terms of compilers, runtimes, etc.

Nothing. Runtimes are available in the OS. Compilers are available via xcode.

h-vetinari commented 3 years ago

If you want to package tensorflow-metal as is in conda-forge, go ahead.

Is the philosophy not usually to avoid binary repackaging (with vanishingly few exceptions)...? If cf/core is fine with that, I'm sure mac users (of which I am not one, so I have no dog in this fight) would be happy to avoid the current ridiculous installation hoops to get GPU-supported tensorflow.

Unless you find a package that needs to be built from source that needs metal, this discussion is pointless.

How realistic is it that no other packages will ever gain metal-support in the future? It may be pointless now, granted. But at the very least it gives a place to refer to when someone (like on the tensorflow-feedstock) asks what conda-forge is thinking re:GPU support on macos.

isuruf commented 3 years ago

Is the philosophy not usually to avoid binary repackaging (with vanishingly few exceptions)...?

If the package is not open source, what other alternative is there?

How realistic is it that no other packages will ever gain metal-support in the future?

Let's leave this conversation at if there's a package that has metal support that needs to be built from source, comment on this issue. I don't want to spend my time discussing about hypothetical scenarios.

h-vetinari commented 3 years ago

If the package is not open source, what other alternative is there?

Not packaging it (or talking with apple) comes to mind, though I'm OK with binary repackaging in this case.

I don't want to spend my time discussing about hypothetical scenarios.

Completely fine, as I said above. There's no rush for this, though IMO it's good to have an issue to refer to.

leofang commented 3 years ago

Metal compiler already is. xcode compilers are compatible with c-f compilers.

@isuruf This is very interesting! Mind to say a few more words here for my own education? 🙂 Suppose a maintainer adds {{ compiler('c') }} in an osx recipe, and also wants to use the Metal compiler, what are the steps? Does it need a xcodeproj file, or can it be done through cmdline (ex build.sh)?

isuruf commented 3 years ago

Either is fine I guess, but prefer doing through cmdline.

isuruf commented 3 years ago

xcodeproj might override some env variables that we set, so it's better to do build.sh.

ngam commented 3 years ago

@h-vetinari fwiw: https://github.com/pytorch/pytorch/issues/47702#issuecomment-953074900 suggests a metal plugin is coming to pytorch soon

💯 agree with you on this issue

msarahan commented 2 years ago

For reference, I added a recipe that had this issue: https://github.com/conda-forge/staged-recipes/pull/17315

I just omitted the compiler for MacOS, based on @isuruf's advice above. It seems to have worked fine.

ngam commented 2 years ago

If someone wants to help with cross-compiling what @msarahan did, please join me here:

https://github.com/conda-forge/libtvm-feedstock/issues/14

@isuruf on top of your head, do you know the flags to pass to have native clang (i.e. clang from the vm, not c-f) cross-compile?

isuruf commented 2 years ago

On second thought, try using the compiler from conda-forge.

ngam commented 2 years ago

Okay, I will try that soon. If I remember correctly, I got an error when I enabled metal support, but that might just need a path or something.

ngam commented 2 years ago

Ah, maybe that's part of our problem: Metal headers only exists from 10.11 upwards it seems: https://github.com/phracker/MacOSX-SDKs/search?q=%3CMetal%2FMTLBlitCommandEncoder.h%3E

Update1: since this issue is likely to be a good resource for people having issues with metal going forward, I found that basically I needed to follow this https://conda-forge.org/docs/maintainer/knowledge_base.html#requiring-newer-macos-sdks to put both the target and the version at 11.0 for full compatibility. Will vary from one case to another. In the tvm case, only the version needed is 11.0, but it produced a lot of warnings about the target being lower than 11.0. For now, this compiles well for osx-64, so moving to see if I could get it to cross-compile.

Update2: It seems like it is cross-compiling perfectly fine having tested the eventual python bindings locally.

Update3: Yes, it is all good --- except one annoying issue with getting the right stuff to test correctly on the CI --- basically it errors on osx-64 saying metal/accelerate symbols not available (so I disabled the testing). This issue doesn't happen on my local machine (xcrun --show-sdk-version --> 12.3) so I assume we ought to tiptoe around this issue; I would assume when it comes to macs, it's significantly less important to offer old compatibility since most mac users are pushed to upgrade frequently (going by Apple's numbers anyway) in sharp contrast to the linux landscape where people use 10yo OSes pretty regularly (me included). Reference messy implementation: https://github.com/conda-forge/libtvm-feedstock/pull/23 and https://github.com/conda-forge/tvm-py-feedstock/pull/22

ngam commented 2 years ago

@isuruf could you please review my updates above to ensure that this is a sane procedure in case people come here to find out how to compile for metal? Two main issues are overriding sdk version/target and testing afterwards.

ngam commented 2 years ago

Now that pytorch is releasing the next version (1.12) with metal support, I was thinking we follow the same playbook as cuda and name these mps builds (metal performance shaders; that's what the backend is called in pytorch). I have tested this multiple times in the pytorch feedstock and it works fine; the only issue is that it requires macos 12.3 (and we don't have those SDKs available) so I am proposing we target the builds to a separate branch (called mps) and we use the macos-12 images instead. See matrix in https://github.com/conda-forge/pytorch-cpu-feedstock/pull/118

Nothing is finalized yet (I am not even a maintainer). The idea is that we will have mps like we have cuda builds. For those following, please let us know what you think in the PR.

h-vetinari commented 2 years ago

I was thinking we follow the same playbook as cuda and name these mps builds (metal performance shaders; that's what the backend is called in pytorch).

The pytorch precedent is relevant, but I'm also not in love with "mps". Why not just use "metal" as a string...? Tensorflow-metal does the same.

I have tested this multiple times in the pytorch feedstock and it works fine;

Thank you for your work on this!

ngam commented 2 years ago

Why not just use "metal" as a string...? Tensorflow-metal does the same

I am indifferent, but I tend to agree with you preferring metal to mps. I think it may be more accurate to call it mps, but well... mps has the word metal in it!

ngam commented 2 years ago

(On a second thought, the tensorflow-metal analogy isn't perfect. Apple releases two things, tensorflow-macos and tensorflow-metal. The latter is just a simple wrapper package to enable metal support, but the former is the actual tensorflow package package. Anyway, I think "pytorch-metal" is likely better than "pytorch-mps" but on the other hand, "metal" may be too generic compared to "mps". Perhaps "pytorch-metal" could be equivalent to "pytorch-gpu" and "pytorch-mps" is more aptly equivalent to "pytorch-cuda" ... 😰 ... 😅 ... )

isuruf commented 2 years ago

Before doing two builds, can't we just do one build with metal enabled? For CUDA, the reason to have a CPU build is because it requires a large dependency and is proprietary. Metal on the other hand has only OS dependency and even though proprietary it is part of the OS.

ngam commented 2 years ago

Before doing two builds, can't we just do one build with metal enabled? For CUDA, the reason to have a CPU build is because it requires a large dependency and is proprietary. Metal on the other hand has only OS dependency and even though proprietary it is part of the OS.

Yes, this should be doable. The current problem is that the metal build will require macos 12.3 and higher. However, there should be a way to enable it regardless (that's what they do upstream anyway). Still, I am not sure if we will need the SDK to be present or not :/ I couldn't get it to listen to the environment variable that forces it to build for the backend (they have mps.is_built() and mps.is_available() functions for this very purpose)

I will try this route again and see if I could get it to work... I will tag you for help if you don't mind if/when I get stuck

isuruf commented 2 years ago

Using SDK 12.3 and higher is fine if the software supports building with a new SDK and targetting an older OSX version using MACOSX_DEPLOYMENT_TARGET

ngam commented 2 years ago

Using SDK 12.3 and higher is fine if the software supports building with a new SDK and targetting an older OSX version using MACOSX_DEPLOYMENT_TARGET

Okay, then we only need minimal edits and we can drop all the naming and differentiation since it will no longer be needed. I will tidy up the approach and then tag you to review it when we are ready (there is no pytorch release yet, likely on the order of week away). Thanks!!

conda-forge / conda-forge.github.io

Support for Apple GPU builds #1537