erip commented 2 years ago

🚀 Feature

Motivation

Packaging pytorch-based packages can be a bit messy because of their reliance on specific CUDA toolkits. Additionally, packages with native code are often messy and rely on a client compiler which can be painful (e.g., on Windows). In the case of xformers, wheels are built but having a condafied package allows for conda-only builds to consume xformers without relying on pip.

Pitch

Add a conda package which either lands in the conda-forge or pytorch channel.

Alternatives

Continue with a wheel-only distribution. Conda users can still pip install xformers which isn't ideal, but not awful.

Additional context

N/A

blefaudeux commented 2 years ago

This would be great really, thanks for the issue here. I've no know how here, pinging @chrisburr and @h-vetinari (who nailed the fairscale release on conda) in case the could point us to some steps (or just have an educated opinion). To give some context xformers needs to build some cpp and cuda code (not that different from fairscale actually, probably that the same config would work). cc @fmassa @dianaml0

erip commented 2 years ago

I've handled simple conda packaging before, but complexity seems to increase with more targets (like CUDA toolkits). I'm happy to work with someone more experienced to push this through. It might require a bit of extra work since I think all dependencies need to be condafied (and triton is not). In any case, please let me know if I can help in some way!

h-vetinari commented 2 years ago

Hey all

I think this can be done. I've looked at the requirements, and I'm not sure what requirements-lra.txt does? It's not used in setup.py AFAICS.

It might require a bit of extra work since I think all dependencies need to be condafied (and triton is not)

I cannot see triton that @erip is mentioning? Other than that, it seems sputnik should possibly be packaged separately...

I'm happy to work with someone more experienced to push this through

I can open a PR to staged-recipes and make you a collaborator of my fork (allowing you to push into the PR), but first I'd like to understand the components a bit better. The list at the bottom of the readme is quite long, and possibly affects packaging (resp. what we need to carry in terms of licenses).

blefaudeux commented 2 years ago

hey @h-vetinari, thank you so much for jumping in ! Trying to unroll:

LRA is not needed here I think (requirements-lra can thus be forgotten), it's an ease of use integration but not part of the core library
some info on triton: https://github.com/openai/triton pulling in the main author @ptillet to see whether conda-ifying would be supported (there are wheels already)
from the credits list, the only code reused mostly as-is was Sputnik and GE-SpMM (which thus kept their original license in our codebase). Triton is used as an external library and the other references are to be transparent and courteous, but the code is not reused directly to the best of my knowledge. Pulling in @tgale96 (Sputnik author) for an opinion on that one

erip commented 2 years ago

Worth mentioning that triton is an optional dependency here, not that it makes a huge difference from the condafication perspective.

h-vetinari commented 2 years ago

@blefaudeux: Triton is used as an external library

OK, I found it now, but it's not under third-party, nor a submodule, etc. What exactly is the relation to upstream triton? Patched fork? Is it frozen or will it pull in upstream changes?

@blefaudeux: LRA is not needed here I think (requirements-lra can thus be forgotten), it's an ease of use integration but not part of the core library

It's pretty easy to do that in conda-forge as well, if you want. There could be a base package xformers and then xformers-lra that contains extra dependencies.

@erip: Worth mentioning that triton is an optional dependency here, not that it makes a huge difference from the condafication perspective.

Conda doesn't understand "optional" - that's why optional installs (like foo[extra]) are often modelled as separate packages. But, for that matter, I don't understand either. It's directly in the source directory, how can it be optional?

erip commented 2 years ago

See here -- basically if CUDA is available, triton is enabled for fusing layers. For CPU-only, triton is not required.

h-vetinari commented 2 years ago

Well, I'd describe that as conditional on CUDA availability (at runtime, no less), rather than being optional. Edit, but point taken, if we package triton separately, a putative CPU-only build of xformers wouldn't need to include it.

@blefaudeux: the only code reused mostly as-is was Sputnik and GE-SpMM

Where is the GE-SpMM code and (same question as above), what's the relation to "upstream" GE-SpMM?

blefaudeux commented 2 years ago

Well, I'd describe that as conditional on CUDA availability (at runtime, no less), rather than being optional. Edit, but point taken, if we package triton separately, a putative CPU-only build of xformers wouldn't need to include it.

@blefaudeux: the only code reused mostly as-is was Sputnik and GE-SpMM

Where is the GE-SpMM code and (same question as above), what's the relation to "upstream" GE-SpMM?

Triton is just used as is, it's a classic python dependency (ie: pip install triton is enough when in the pip world)

Sputnik is in third_party/sputnik Gespmm is in xformers/components/attention/csrc (sorry, on mobile so links are cumbersome)

h-vetinari commented 2 years ago

@h-vetinari: What exactly is the relation to upstream triton? Patched fork? Is it frozen or will it pull in upstream changes?

OK, I think I'm understanding better now. The code in xformers/triton is the code using upstream openai/triton, rather than being a fork etc.

h-vetinari commented 2 years ago

Triton is just used as is, it's a classic python dependency (ie: pip install triton is enough when in the pip world)

Yeah, I was just confused because it doesn't appear in requirements.txt...

erip commented 2 years ago

Yeah, I think the heavy lift is supporting the various CUDA toolkits which I don't have experience supporting in conda recipes. I'm happy to be added as a maintainer or outside collaborator for a staged recipe @h-vetinari. Hopefully it shouldn't be a big issue. Meanwhile I can try to condify some of the other dependencies.

h-vetinari commented 2 years ago

Gespmm is in xformers/components/attention/csrc

The only thing I've found is:

// taken from
// https://github.com/hgyhungry/ge-spmm/blob/master/pytorch-custom/computeUtil.h
// with minor modifications

here and here. Sounds like this doesn't need separate packaging (also, I don't see a license beyond that note; at least upstream is MIT).

h-vetinari commented 2 years ago

Triton on the other hand looks a bit more involved. It looks like it's using static libraries (where conda-forge intentionally only has dynamic ones), and also demands LLVM 13 on windows, which hasn't made it to conda-forge yet in its entirety.

blefaudeux commented 2 years ago

Triton on the other hand looks a bit more involved. It looks like it's using static libraries (where conda-forge intentionally only has dynamic ones), and also demands LLVM 13 on windows, which hasn't made it to conda-forge yet in its entirety.

I think that not handling windows is fine to be honest, at least having a conda package which is pre-compiled only for linux is already strictly superior to anything we have (see #157 for instance). Let's not make better the enemy of good :)

h-vetinari commented 2 years ago

I think that not handling windows is fine to be honest, at least having a conda package which is pre-compiled only for linux is already strictly superior to anything we have (see #157 for instance). Let's not make better the enemy of good :)

Yeah, TBH, replacing the static libraries are a bigger unknown right now than just waiting for availability of LLVM 13 (but even that should be tractable, though it'll most likely need patching).

In any case, I started with cutlass, which is needed by triton (well, perhaps not needed specifically, but unless I'm hitting a hard roadblock, I tend to try to build all bindings, otherwise people end up getting disappointed that the conda-forge version has less capabilities).

blefaudeux commented 2 years ago

thanks a lot for that work @h-vetinari, very much appreciated !

fmassa commented 2 years ago

Hi,

One complication with the current state of xformers is that we depend on a nightly version of PyTorch for the sparse self-attention.

This will be lift once PyTorch 1.11 gets released, but for now we would need to release a nightly package.

I know that @lw has recently done a very similar work for fairring, so we could probably follow the exact same steps as him

lw commented 2 years ago

I set up a conda package build against PyTorch nightly but in hindsight it might have been better to set up wheels: PyTorch's conda channel prunes the old packages daily, meaning that for a few hours a day it will be impossible to install my nightly. This is because otherwise PyTorch would hit against the conda channel size limit. There's no such issues for wheels because PyTorch self-hosts those. On the other side, if you want to use wheels and support multiple versions of CUDA you might have to self-host the wheels as well.

You can find my setup here and here.

lw commented 2 years ago

I've skimmed the rest of the discussions and I've seen mentions of bundling static libraries and how that's hard because conda-forge only ships dynamic ones. I've been there with Fairring, and our solution is to rebuild those deps as part of our recipe and then combine the static archives with our code within the final shared object (and we even put extra care in hiding these deps' dynamic symbols). It's all very hacky and makes our conda packages bloated, but it works.

erip commented 2 years ago

Tracking triton condafication here

tgale96 commented 2 years ago

I believe xformers had some modifications to Sputnik kernels so it'd probably make sense to just package those up in the xformers conda package? I'd be open to adding a conda package for Sputnik but I don't know if this makes sense given it's all C++.

h-vetinari commented 2 years ago

I've been there with Fairring, and our solution is to rebuild those deps as part of our recipe and then combine the static archives with our code within the final shared object (and we even put extra care in hiding these deps' dynamic symbols). It's all very hacky and makes our conda packages bloated, but it works.

Yeah, I can see how that would work in principle, but if we're gonna get that hacky, I prefer to just patch the CMakeLists.txt to to look for the dynamic libs in the right place...

I believe xformers had some modifications to Sputnik kernels so it'd probably make sense to just package those up in the xformers conda package? I'd be open to adding a conda package for Sputnik but I don't know if this makes sense given it's all C++.

Seeing that https://github.com/facebookresearch/xformers/tree/main/third_party/sputnik only has the initial commit publicly visible, I cannot really say (aside from diffing against commits of upstream sputnik) but as long as the divergences are not too great, I'd still package it separately. The fact that it's C++ doesn't matter, a lot of packages in conda-forge are non-python. From a packaging POV, the ideal best case would be if third_party/sputnik would be an unmodified submodule following exact versions or at least upstream commits (depending on the ABI stability of sputnik, this could be relaxed).

h-vetinari commented 2 years ago

One complication with the current state of xformers is that we depend on a nightly version of PyTorch for the sparse self-attention.

This will be lift once PyTorch 1.11 gets released, but for now we would need to release a nightly package.

For conda-forge, this means we won't be able to package until pytorch 1.11 comes along. But at least that should give us some time to figure out all the other ingredients.

lw commented 2 years ago

Yeah, I can see how that would work in principle, but if we're gonna get that hacky, I prefer to just patch the CMakeLists.txt to to look for the dynamic libs in the right place...

I don't know which libraries we're talking about here but it might not be that simple.

One issue at play is ABI backwards-compatibility for those libraries, as if two versions of the library get loaded into the same process with incompatible symbols of the same name you'll hit a whole host of issues.
Another issue is some libraries being bundled/buried within some other libraries and thus not being directly accessible, for example the PyTorch shared objects contain a full static copy of NCCL (and others), and thus re-expose its symbols, but don't provide a standalone NCCL shared object (nor do they easily provide access to the headers that correspond to those symbols).

fmassa commented 2 years ago

@h-vetinari even if we were to package sputnik in a conda package, we would still need to have C++ / CUDA binaries compiled for torchvision, as we have our own kernels implemented as well in the csrc folder. So I would maybe not go this route for now.

The simplest for now is indeed to wait until PyTorch releases the 1.11 version (which should be sometime in March), so that we can rely on it (instead of building nightlies).

Then we can provide binaries for the combination of {Linux, OSX, Windows} x {CPU, CUDA} x {Python version} that we will want to support.

I've started some of this work for automating the pip wheels (with the CUDA binaries) in https://github.com/facebookresearch/xformers/pull/84. Normally all we need now is to upload the wheels once they are build by the CI, but I have yet to get to it. I think we should do something similar for conda as well.

h-vetinari commented 2 years ago

Now that we have triton in conda-forge, I wanted to tackle this again, but noticed that I had falsely assumed that pyre is already in conda-forge: https://github.com/conda-forge/staged-recipes/issues/18617.

I assume the pyre-extensions aren't optional here?

blefaudeux commented 2 years ago

sorry for the delay @h-vetinari , and thanks for getting back to us ! nice, I didn't know that you had triton, do you track the recent releases by any chance (v 2+) or is that "just" the stable version (1.x) ? Else right now pyre-extensions is not optional if my memory serves me well, @pradeep90 would be a good resource if there are corners to iron out and if we can help in general ?

h-vetinari commented 2 years ago

nice, I didn't know that you had triton, do you track the recent releases by any chance (v 2+) or is that "just" the stable version (1.x) ?

Triton was added to conda-forge recently, though it's been a long time coming in https://github.com/conda-forge/staged-recipes/pull/17310 (we also did cutlass beforehand, before finding out that triton doesn't need/support it anymore).

I'm not aware of a v2, we're tracking the upstream repo, and there all I see is 1.x.

Else right now pyre-extensions is not optional if my memory serves me well

I'm a bit surprised about that, because pyre is about type checking, which is usually optional in python. What portions depend essentially on this? I'm just asking because if(?) pyre-extensions depend on pyre, then we also need to build watchman first, and both pyre/watchman look like non-trivial builds to pull of. Not impossible, but just more links in the chain to get out of the way first...

pradeep90 commented 2 years ago

I'm a bit surprised about that, because pyre is about type checking, which is usually optional in python. What portions depend essentially on this? I'm just asking because if(?) pyre-extensions depend on pyre, then we also need to build watchman first, and both pyre/watchman look like non-trivial builds to pull of. Not impossible, but just more links in the chain to get out of the way first...

Hi, pyre-extensions doesn't depend on Pyre or watchman. It's just a small utility library: https://pypi.org/project/pyre-extensions/

(We use it to dogfood shape types for Tensors (https://github.com/facebookresearch/xformers/issues/43))

facebookresearch / xformers

Add conda package #152

🚀 Feature

Motivation

Pitch

Alternatives

Additional context