Open h-vetinari opened 3 years ago
Just to add, Apple recommends Miniforge3 and not miniconda so I guess they're aware of conda-forge. Maybe this is something we can use to our advantage?
I don't think there's anything we have to do. If the package supports GPU builds, they can use the metal compiler to compile the GPU code.
I don't think there's anything we have to do. If the package supports GPU builds, they can use the metal compiler to compile the GPU code.
Not sure I understand - "they" is the users? Why shouldn't the target be that (like everything else) that conda-forge has pre-compiled everything carefully?
I get that this would mean that there'd have to be a cf-compatible metal compiler (and probably other bits), but that could perhaps be a long-term goal (similar to how NVIDIA is now using conda-forge)?
Not sure I understand - "they" is the users?
No, the maintainer of the package.
I get that this would mean that there'd have to be a cf-compatible metal compiler
Metal compiler already is. xcode compilers are compatible with c-f compilers.
No, the maintainer of the package.
Metal compiler already is. xcode compilers are compatible with c-f compilers.
Sounds very good. Do we have any package in CF already that does GPU builds on macos?
Nope. Metal is very macOS specific and I haven't seen packages supporting metal except games.
Is it fair to say that we're in a situation where we believe it to be possible, but no-one has tried (and worked out the kinks) yet?
I don't understand what you are trying to do here. Have you seen any package with Metal support that we package without Metal?
FYI, Apple's tensorflow-metal is closed source, so we can't build it from source.
I don't understand what you are trying to do here. Have you seen any package with Metal support that we package without Metal?
It's very possible that I'm wrong on this, but my understanding is that we're not building any GPU-code paths on macos for packages like tensorflow (cf. what kicked off this issue in the OP), pytorch, cupy, arrow, faiss, etc.
However, having found out that apple apparently provides GPU-enabled builds with tensorflow-metal
, I was wondering what obstacles there would be to doing the same in CF.
FYI, Apple's tensorflow-metal is closed source, so we can't build it from source.
Yeah, I get that it's not just flipping a switch. I was wondering (aside from source availability) what we'd need to in terms of compilers, runtimes, etc. (and there are probably other license issues there as well). Still I think the collaboration of conda-forge with NVIDIA (also not exactly a paragon of FOSS in the past) would be an interesting template to follow long-term.
As I said, tensorflow-metal
is closed source and none of the other packages you mentioned support metal.
If you want to package tensorflow-metal
as is in conda-forge, go ahead. The license of it allows redistribution.
Unless you find a package that needs to be built from source that needs metal, this discussion is pointless.
I was wondering (aside from source availability) what we'd need to in terms of compilers, runtimes, etc.
Nothing. Runtimes are available in the OS. Compilers are available via xcode.
If you want to package
tensorflow-metal
as is in conda-forge, go ahead.
Is the philosophy not usually to avoid binary repackaging (with vanishingly few exceptions)...? If cf/core is fine with that, I'm sure mac users (of which I am not one, so I have no dog in this fight) would be happy to avoid the current ridiculous installation hoops to get GPU-supported tensorflow.
Unless you find a package that needs to be built from source that needs metal, this discussion is pointless.
How realistic is it that no other packages will ever gain metal-support in the future? It may be pointless now, granted. But at the very least it gives a place to refer to when someone (like on the tensorflow-feedstock) asks what conda-forge is thinking re:GPU support on macos.
Is the philosophy not usually to avoid binary repackaging (with vanishingly few exceptions)...?
If the package is not open source, what other alternative is there?
How realistic is it that no other packages will ever gain metal-support in the future?
Let's leave this conversation at if there's a package that has metal support that needs to be built from source, comment on this issue. I don't want to spend my time discussing about hypothetical scenarios.
If the package is not open source, what other alternative is there?
Not packaging it (or talking with apple) comes to mind, though I'm OK with binary repackaging in this case.
I don't want to spend my time discussing about hypothetical scenarios.
Completely fine, as I said above. There's no rush for this, though IMO it's good to have an issue to refer to.
Metal compiler already is. xcode compilers are compatible with c-f compilers.
@isuruf This is very interesting! Mind to say a few more words here for my own education? 🙂 Suppose a maintainer adds {{ compiler('c') }}
in an osx recipe, and also wants to use the Metal compiler, what are the steps? Does it need a xcodeproj
file, or can it be done through cmdline (ex build.sh
)?
Either is fine I guess, but prefer doing through cmdline.
xcodeproj
might override some env variables that we set, so it's better to do build.sh
.
@h-vetinari fwiw: https://github.com/pytorch/pytorch/issues/47702#issuecomment-953074900 suggests a metal plugin is coming to pytorch soon
💯 agree with you on this issue
For reference, I added a recipe that had this issue: https://github.com/conda-forge/staged-recipes/pull/17315
I just omitted the compiler for MacOS, based on @isuruf's advice above. It seems to have worked fine.
If someone wants to help with cross-compiling what @msarahan did, please join me here:
https://github.com/conda-forge/libtvm-feedstock/issues/14
@isuruf on top of your head, do you know the flags to pass to have native clang (i.e. clang from the vm, not c-f) cross-compile?
On second thought, try using the compiler from conda-forge.
Okay, I will try that soon. If I remember correctly, I got an error when I enabled metal support, but that might just need a path or something.
Ah, maybe that's part of our problem: Metal headers only exists from 10.11 upwards it seems: https://github.com/phracker/MacOSX-SDKs/search?q=%3CMetal%2FMTLBlitCommandEncoder.h%3E
Update1: since this issue is likely to be a good resource for people having issues with metal going forward, I found that basically I needed to follow this https://conda-forge.org/docs/maintainer/knowledge_base.html#requiring-newer-macos-sdks to put both the target and the version at 11.0 for full compatibility. Will vary from one case to another. In the tvm case, only the version needed is 11.0, but it produced a lot of warnings about the target being lower than 11.0. For now, this compiles well for osx-64, so moving to see if I could get it to cross-compile.
Update2: It seems like it is cross-compiling perfectly fine having tested the eventual python bindings locally.
Update3: Yes, it is all good --- except one annoying issue with getting the right stuff to test correctly on the CI --- basically it errors on osx-64 saying metal/accelerate symbols not available (so I disabled the testing). This issue doesn't happen on my local machine (xcrun --show-sdk-version
--> 12.3
) so I assume we ought to tiptoe around this issue; I would assume when it comes to macs, it's significantly less important to offer old compatibility since most mac users are pushed to upgrade frequently (going by Apple's numbers anyway) in sharp contrast to the linux landscape where people use 10yo OSes pretty regularly (me included). Reference messy implementation: https://github.com/conda-forge/libtvm-feedstock/pull/23 and https://github.com/conda-forge/tvm-py-feedstock/pull/22
@isuruf could you please review my updates above to ensure that this is a sane procedure in case people come here to find out how to compile for metal? Two main issues are overriding sdk version/target and testing afterwards.
Now that pytorch is releasing the next version (1.12) with metal support, I was thinking we follow the same playbook as cuda and name these mps
builds (metal performance shaders; that's what the backend is called in pytorch). I have tested this multiple times in the pytorch feedstock and it works fine; the only issue is that it requires macos 12.3 (and we don't have those SDKs available) so I am proposing we target the builds to a separate branch (called mps) and we use the macos-12 images instead. See matrix in https://github.com/conda-forge/pytorch-cpu-feedstock/pull/118
Nothing is finalized yet (I am not even a maintainer). The idea is that we will have mps
like we have cuda
builds. For those following, please let us know what you think in the PR.
I was thinking we follow the same playbook as cuda and name these
mps
builds (metal performance shaders; that's what the backend is called in pytorch).
The pytorch precedent is relevant, but I'm also not in love with "mps". Why not just use "metal" as a string...? Tensorflow-metal does the same.
I have tested this multiple times in the pytorch feedstock and it works fine;
Thank you for your work on this!
Why not just use "metal" as a string...? Tensorflow-metal does the same
I am indifferent, but I tend to agree with you preferring metal to mps. I think it may be more accurate to call it mps, but well... mps has the word metal in it!
(On a second thought, the tensorflow-metal analogy isn't perfect. Apple releases two things, tensorflow-macos and tensorflow-metal. The latter is just a simple wrapper package to enable metal support, but the former is the actual tensorflow package package. Anyway, I think "pytorch-metal" is likely better than "pytorch-mps" but on the other hand, "metal" may be too generic compared to "mps". Perhaps "pytorch-metal" could be equivalent to "pytorch-gpu" and "pytorch-mps" is more aptly equivalent to "pytorch-cuda" ... 😰 ... 😅 ... )
Before doing two builds, can't we just do one build with metal enabled? For CUDA, the reason to have a CPU build is because it requires a large dependency and is proprietary. Metal on the other hand has only OS dependency and even though proprietary it is part of the OS.
Before doing two builds, can't we just do one build with metal enabled? For CUDA, the reason to have a CPU build is because it requires a large dependency and is proprietary. Metal on the other hand has only OS dependency and even though proprietary it is part of the OS.
Yes, this should be doable. The current problem is that the metal build will require macos 12.3 and higher. However, there should be a way to enable it regardless (that's what they do upstream anyway). Still, I am not sure if we will need the SDK to be present or not :/ I couldn't get it to listen to the environment variable that forces it to build for the backend (they have mps.is_built()
and mps.is_available()
functions for this very purpose)
I will try this route again and see if I could get it to work... I will tag you for help if you don't mind if/when I get stuck
Using SDK 12.3 and higher is fine if the software supports building with a new SDK and targetting an older OSX version using MACOSX_DEPLOYMENT_TARGET
Using SDK 12.3 and higher is fine if the software supports building with a new SDK and targetting an older OSX version using MACOSX_DEPLOYMENT_TARGET
Okay, then we only need minimal edits and we can drop all the naming and differentiation since it will no longer be needed. I will tidy up the approach and then tag you to review it when we are ready (there is no pytorch release yet, likely on the order of week away). Thanks!!
So far, apple has not been mentioned at all in #63, so I thought this deserves a new issue.
I'm also aware that the CI situation for apple is not great (especially for osx-arm), and that apple+GPU is even worse, but I'm thinking that in an ideal world, it should still be possible eventually.
Apple has some builds in a conda channel, and here are their install instructions for the final product.
This leads to really contorted installation instructions in the wild (e.g. here), and consequently broken environments: https://github.com/conda-forge/tensorflow-feedstock/issues/154