conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.
BSD 3-Clause "New" or "Revised" License
17 stars 43 forks source link

GPU Support #7

Closed hmaarrfk closed 3 years ago

hmaarrfk commented 5 years ago

@jjhelmus it seems you were able to build pytorch GPU without needing to have variants

https://anaconda.org/anaconda/pytorch/files?version=1.0.1

Is that true?

If so, what challenges do you see moving this work to conda-forge?

jakirkham commented 5 years ago

Probably a few things. Here are some thoughts based on things we have been working on to get GPU packages to build.

  1. Using different Docker images.
  2. Requiring the nvcc compiler. Here's an example.
  3. Tying the nvcc compiler version to the Docker image.
  4. General reworking of conda-smithy, staged-recipes, and other infrastructure to handle this.

The next thing we need to figure out is how to test the packages. There has been some good discussion and some investigation into possible options. Still more to do here though.

jakirkham commented 5 years ago

@soumith, do you have any thoughts on this? 🙂

soumith commented 5 years ago

@jakirkham 's plan sounds about right. The PyTorch official conda binaries in the pytorch channel have been built the same way and the scripts are at https://github.com/pytorch/builder/tree/master/conda

jjhelmus commented 5 years ago

The recipes used to build the pytorch packages in the defaults channel can be found in the pytorch-feedstock directory of the aggregate repository. These are built using the conda provided compilers but need nvcc and the CUDA runtime library for testing which are provided by the appropriate docker images.

My understanding is that PyTorch does dynamic loading of the CUDA libraries and therefore the package build with GPU support will work on system without a GPU. A CPU only variant would still be a nice addition since the GPU variant is a large download and requires the cudatoolkit and cudnn packages which are also quite large.

jakirkham commented 4 years ago

I think we are now in a good place to try building a GPU enabled pytorch package in conda-forge. Happy to give this a go if that sounds reasonable. 🙂

hmaarrfk commented 4 years ago

The CPU builds are timing out on windows :(

On Mon, Sep 30, 2019, 8:18 AM jakirkham notifications@github.com wrote:

I think we are now in a good place to try building a GPU enabled pytorch package in conda-forge. Happy to give this a go if that sounds reasonable. 🙂

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7?email_source=notifications&email_token=AAAV7GDQ4GYER6EEILH7RFTQMIKEPA5CNFSM4IADX5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76ASIY#issuecomment-536611107, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAV7GCEDERATU34Y7OGCCDQMIKEPANCNFSM4IADX5PA .

soumith commented 4 years ago

@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

hmaarrfk commented 4 years ago

@soumith, thanks for keeping up with the conversation. XREF: https://github.com/conda-forge/torchvision-feedstock/issues/2

Adding pytorch to conda-forge has two advantages:

  1. It would allow developers to have packages that explicitly depend on pytorch.
  2. In theory, it would help streamline the installation of multiple different packages beyond those that exist in the default channel. The default channel has the basics, but for many things, I find it lacking. Pointing people to conda-forge (or pip) for software is something I find myself doing from time to time.
  3. It allows the use of a consistent set of compilers, which avoids ABI incompatibility.

I agree that uploading the torchvision package was likely a mistake before the pytorch package was in place. For users that depend on torchvision, I think the 0.2 package version is correct. That said, pytorch moves so quickly, I that users need to be mindful of what version they install. I think the particular user would likely benefit from having the dependency - torchvision >=0.4 in their spec fixing the current incompatibility.

As for being behind on the builds, part of that is that the conda-forge infrastructure isn't setup to automatically detect new tags on github. The tar balls uploaded/generated by your team to github do not contain the 3rd party libraries. In order to build everything, I had to use the git repo. I think I can add the tar ball to trick the updater into automatically rebuilding, but at this point, the azure machines just ran out of RAM.....

I might have to inspire mysrlf from your work: https://github.com/pytorch/builder/tree/master/conda to find a solution. Honestly, help in building the package the right way would be appreciated, but I understand if you find that hard to justify.

Finally, the last advantage is that conda-forge is also looking beyond x86 architectures, and using the conda-forge platform would enable a pathway toward arm/ppc builds (though they are blocked on the graphics stack for now).

kkraus14 commented 4 years ago

@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

From the perspective of another maintainer of GPU packages that reside in a different channel than conda-forge, it makes dependency management and end user experience much nicer / easier when they have a one stop shop to get their packages. From what I've seen users typically don't modify their .condarc file, and just add channels to individual install commands, and then things get unexpectedly downgraded / upgraded and the end user has a bad time.

hmaarrfk commented 4 years ago

They add channels to individual install commands because as maintainers, we often want to give them a single command , on one line, to execute.

I'm definitely guilty of this, especially for pure python packages, when I have a feeling that users don't want global changes to their environments.

On Mon, Sep 30, 2019, 3:56 PM Keith Kraus notifications@github.com wrote:

@jakirkham https://github.com/jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

From the perspective of another maintainer of GPU packages that reside in a different channel than conda-forge, it makes dependency management and end user experience much nicer / easier when they have a one stop shop to get their packages. From what I've seen users typically don't modify their .condarc file, and just add channels to individual install commands, and then things get unexpectedly downgraded / upgraded and the end user has a bad time.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/conda-forge/pytorch-cpu-feedstock/issues/7?email_source=notifications&email_token=AAAV7GGESYOS4PQZNKKLA6TQMJ73BA5CNFSM4IADX5PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD77LFVI#issuecomment-536785621, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAV7GFHKCQ4VXIDQZWMIMTQMJ73BANCNFSM4IADX5PA .

rgommers commented 4 years ago

This issue and the related discussion in https://github.com/conda-forge/torchvision-feedstock/issues/2 do point out some real issues with the conda/conda-forge model. Questions like:

Note, I know this is probably not the optimal place to discuss this, and neither is Twitter (Cc @mrocklin and @jph00). But what is?

I honestly don't know the answers to any of these questions, and that's pretty telling given that I've been involved in packaging for a long time and am a maintainer on the NumPy and SciPy feedstocks. I've just scanned through the conda-forge docs again, and it doesn't provide answers.

Adding pytorch to conda-forge has two advantages:

  1. It would allow developers to have packages that explicitly depend on pytorch.

This is a rule that's more social than a hard technical requirement.

  1. In theory, it would help streamline the installation of multiple different packages beyond those that exist in the default channel. The default channel has the basics, but for many things, I find it lacking. Pointing people to conda-forge (or pip) for software is something I find myself doing from time to time.

Again social. The pytorch channel has a well-maintained and complete set of packages that could be relied on.

  1. It allows the use of a consistent set of compilers, which avoids ABI incompatibility.

This is true, maybe, sometimes. Package maintainers, with a few notable exceptions (Arrow, Tensorflow 1.x EDIT: not even true anymore for Arrow it looks like, 0.14.1 has binary wheels. Tensorflow 2.0 also has compliant manylinux wheels now), do make this work on PyPI so there's no real reason it couldn't be made to work cross-channel within conda given the right set of conventions/specs/tools.

h-vetinari commented 4 years ago

Mixing channels has not worked out so well historically, which is why we now have --strict-channel-priority and so many packages migrating to conda-forge (which obviously has many more other reasons too).

It would be interesting to see how some "cross-compatible" channels would look like (or how that could ever be enforced in a way that gets the blessing of conda/conda-forge), but while it is mostly a social convention (as @rgommers mentions), there is a big impact of channels for corporate environments, where the rest of the internet is usually behind a proxy. Getting anything other than the main channels + conda-forge past IT / sysadmins / etc. is a hassle, both procedurally and technically, so every channel has a substantial incremental cost, while conda-forge just works (after the initial setup).

@rgommers: Note, I know this is probably not the optimal place to discuss this, and neither is Twitter (Cc @mrocklin and @jph00). But what is?

CC @conda-forge/core @mingwandroid

Edit: Probably best to start (at least) at Ralf's post.

isuruf commented 4 years ago

Package maintainers, with a few notable exceptions (Arrow, Tensorflow 1.x), do make this work on PyPI so there's no real reason it couldn't be made to work cross-channel within conda given the right set of conventions/specs/tools.

pip doesn't respect version constraints of already installed packages which makes it easy to break environments.

Who sets these right set of conventions? Even defaults and conda-forge can't agree on conventions. For example, in conda-forge, we provide different BLAS implementations and users can decide at install time, but this is not the case with pytorch which requires MKL.

Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?

It's up to the maintainers.

Why aren't projects maintaining their own conda-forge feedstocks (or any conda packages for that matter)? Should we want that and ask them to?

There are some people who do maintain their feedstocks outside of conda-forge. conda-smithy supports creating feedstocks and uploading to a custom channel.

Does everything need to be in conda-forge? If so, what's the point of channels? If not, why can't we have cross-channel dependencies?

No. Packages can be in other channels, but cross-channel dependencies means we lose control. conda-forge's community does a lot of work to keep the ABI compatibility and the ability to create consistent environments. @conda-forge/core is called for help very frequently in merging some PR in a feedstock that has been abandoned.

rgommers commented 4 years ago

pip doesn't respect version constraints of already installed packages which makes it easy to break environments.

that's not really relevant here (and will be solved at some point)

Who sets these right set of conventions? Even defaults and conda-forge can't agree on conventions.

This will need solving one way or another I think.

For example, in conda-forge, we provide different BLAS implementations and users can decide at install time, but this is not the case with pytorch which requires MKL.

PyTorch relies on MKL features beyond plain BLAS, like fft related functionality. So the BLAS-switching isn't relevant here. A conda-forge PyTorch package simply must depend on MKL directly.

Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?

It's up to the maintainers.

That's not a real answer. The point is: project maintainers may very well be willing to help, but it's not clear how. The conda-forge team/community needs to have a clear vision here. Right now the norm for releasing any project is: release on PyPI as wheels and sdist, then let someone else worry about conda-forge (that someone else could be an individual project maintainer, or a user, or a conda-forge/core member - but it's not the project release manager normally). The norm could change to having releases to conda-forge, or to a custom conda channel, be part of the official project release procedure.

There are some people who do maintain their feedstocks outside of conda-forge. conda-smithy supports creating feedstocks and uploading to a custom channel.

This sounds like there could be part of an answer in here ....

No. Packages can be in other channels, but cross-channel dependencies means we lose control.

That's missing the point - this pytorch-cpu-feedstock you have "control" over, but for users it is unhelpful that this even exists, and from a maintainer point of view why would anyone want to spend double effort to maintain PyTorch builds in two channels (conda-forge and pytorch)?

isuruf commented 4 years ago

The point is: project maintainers may very well be willing to help, but it's not clear how.

and from a maintainer point of view why would anyone want to spend double effort to maintain PyTorch builds in two channels

I'm not sure I understand. What do the maintainers want to do? Custom channel or in conda-forge?

h-vetinari commented 4 years ago

I'm not sure I understand. What do the maintainers want to do? Custom channel or in conda-forge?

Probably best to start (at least) at Ralf's post. I've edited my post accordingly.

isuruf commented 4 years ago

PyTorch relies on MKL features beyond plain BLAS, like fft related functionality. So the BLAS-switching isn't relevant here. A conda-forge PyTorch package simply must depend on MKL directly.

Okay. How about the fact that pytorch conda package in pytorch channel require GLIBC 2.17 (centos7) and conda-forge uses GLIBC 2.12 (centos6).

Everything boils down to the set of conventions used. Would other channels like pytorch be open to using the set of conventions that conda-forge use?

pearu commented 4 years ago

The fact that conda-forge uses GLIBC 2.12 (centos6) is a show stopper for recipes that want to install the latest cudatoolkit (10.1 currently) that installer requires GLIBC 2.14 or newer. For instance, the PR https://github.com/conda-forge/cudatoolkit-dev-feedstock/pull/16 is held back because of this and I would actually hope that conda-forge would start to use a newer GLIBC rather than other channels using the old GLIBC.

isuruf commented 4 years ago

@pearu, can you open a different issue for increasing GLIBC version?

(For others who are curious, latest cudatoolkit does not require GLIBC 2.14 as mentioned in the documentation at https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html. Linked PR was just using a wrong URL.)

(Off-topic: there are people interested in making manylinux2010 (centos6) wheels from conda packages and increasing the GLIBC version would stop that.)

bgruening commented 4 years ago

Hi all,

thanks for this discussion. Let me try to provide my experience as someone that contributes since 4 years to the conda ecosystem and has migrated (with a lot of help) more than 1000 packages to conda-forge.

Should project maintainers maintain their own conda-forge feedstocks, publish to their own channel, or both?

On conda-forge or a community (and here I mean broader scientific fields and not project-wide-communities) channel. I would always recommend conda-forge only for multiple reasons, but the most obvious one is that you want your users to use as less channels as possible. It is never good to have too many channels/ppa/... activated.

Why aren't projects maintaining their own conda-forge feedstocks (or any conda packages for that matter)? Should we want that and ask them to?

They do! I know a lot of upstream maintainers that do that and I would recommend that always. What is happening on staged-recipes is that upstream maintainers are very often pinged and ask if they want to co-maintain the feedstock. Imho this works very well.

Does everything need to be in conda-forge?

Everything that is stable, yes, why not. Keep in mind to keep the channels as low as possible. There are other ways, like BIoconda is doing, I will get to this later.

If so, what's the point of channels?

For unstable stuff, for training, for beta-version, for people that don't like to play with communities or don't recognize the complexity of integration. It's a matter of providing freedom and trying different community models (bioconda vs. conda-forge etc.). It's good to have this choice. As a matter of fact the conda-forge model (even if not perfect) is the yet most scalable approach that we have seen. And I'm a Bioconda core member that is speaking here.

Please keep in mind that multiple channels are always more complicated to maintain as just one. An other example is name-space clashes. You have this way better under control in one channel than in 10 channels.

If not, why can't we have cross-channel dependencies?

We can. But we need to play together. Bioconda, a channel with 7000 Bioinformatic packages, is depending on conda-forge. We recommend the channel order conda-forge > biodonda. We do have linters in conda-forge and bioconda that prevent name-clashes. Bioconda members are part of conda-forge, we do agree on the same glibc version, we sync the pinnings - we (Bioconda) essentially following conda-forge and discuss how conda-forge evolves with them together as we depend on them. No redundancy, high quality packages and everyone is happy!

But, there needs to be the will to work together and invest time and effort to make this happen.

Is "everything in conda-forge" even scalable (the last 1-1.5 years suggest not)?

Not sure what you are referring to, but as someone that runs >1000 environments since 4 years on all kind of infrastructure from HPC to Cloud and as a maintainer of BioContainers (building containers out of Conda packages) ... conda is scalable. The most scalable package manager that I have seen so far ... yes even more scalable than dpkg and such. But this also means it's way more complex.

If you acknowlegde that and if you have seen how this community can maintain >10 languages, > 1000 R packages (that are rebuild all 6 month), that we have rebuild everything against new compilers, that if boost or any other library (as zlib) gets a new version, all dependent packages are rebuild (also in Bioconda) than you would probably also say that conda-forge is scalable :)

@pearu;

The fact that conda-forge uses GLIBC 2.12 (centos6) is a show stopper for recipes that want to install the latest cudatoolkit (10.1 currently) that installer requires GLIBC 2.14 or newer. For instance, the PR conda-forge/cudatoolkit-dev-feedstock#16 is held back because of this and I would actually hope that conda-forge would start to use a newer GLIBC rather than other channels using the old GLIBC.

That is exacly what @isuruf was trying to say. It is some kind of agreement that many hundret of people have taken and there are valid reasons for sticking to GLIBC 2.12. For example that all HPC environments I know are running CentOS6 (or similar systems) and this will not go away soon. Gosh I have seen HPC with CentOS5 a year ago :( So if you are proposing to just update GLIBC you are breaking the workflow and the accessibilty have many thousands of users. And the reason is some properitrary binary blob? However, if you like let's start this discussion, get peoples opinion and ask the scientific community.

In the end my key-points are:

Thanks again for starting this discussion, happy to answer any question also in regard to Bioconda and how we maintain a separate channel but stay compatible with conda-forge :heart:

rgommers commented 4 years ago

Everything boils down to the set of conventions used.

It's important, but not the right place to start. Needed conventions follow from goals/requirements that affect users/projects/maintainers/redistributors/etc. So I'll start there.

As a user, I want to be able to use conda to install the latest, feature-complete version of any package I need. I don't really care too much how, I just care that it's robust. So typing something like conda install numpy torch # should get me numpy 1.17.2, pytorch 1.2.0; append -c somechannel if needed. What I don't want is pick up outdated or broken versions. I can learn that defaults may be slightly behind or not contain some packages, so then I go to conda-forge or make that my default for everything for example. But if python install torch -c conda-forge works, I get unhappy if I have to find out the hard way that that got me PyTorch 1.1 without GPU support.

As a maintainer, I want to make my package available to all conda and pip users as soon as possible when I tag a new release. With the least work possible, following some standard procedure each release. Again, I don't really care too much how, happy to take guidance there; among the options:

  1. release sdist and wheels to PyPI, let other packaging teams (Debian, Homebrew, conda-forge, etc.) take it from there
  2. release to PyPI and conda-forge in parallel myself
  3. release to PyPI and my own conda channel in parallel myself; let conda-forge sync with my channel somehow
  4. release to one conda channel only, and let conda-press produce wheels from that that I then upload to PyPI (long-term even nicer, only one build toolchain)

Right now, NumPy, SciPy and the majority of packages do (1). PyTorch does the first part of (3), but conda-forge doesn't "sync" correctly.

As a ecosystem-wide contributor, I want to be able to tell users how to easily install and use large parts of the NumPy/PyData ecosystem. Ideally this is something like "download [Ana|Mini]conda, open your IDE of choice and work through these tutorials" followed by "if something is not in the defaults channel, do X". This is harder today than it was 3 years ago ...

Would other channels like pytorch be open to using the set of conventions that conda-forge use?

Note that I'm not a PyTorch maintainer (although I am contributing) so I won't try to answer that for @soumith. I believe this problem isn't really PyTorch-specific though.

Some other thoughts:

I'll close with echoing @jph00's sentiment on Twitter: anyway, I'll close by saying I really don't like bringing up negative issues like this, because it's stressful and tiring. I only do so about things I really care about and want to be successful. Like conda.

rgommers commented 4 years ago

Thanks for the insights @bgruening

We can. But we need to play together. Bioconda, a channel with 7000 Bioinformatic packages, is depending on conda-forge. We recommend the channel order conda-forge > biodonda.

You can only when it's one-way right? I mean, nothing in conda-forge could depend on the pytorch channel or any other channel than defaults? If that could be make bi-directional, then this feedstock could disappear, that would be very helpful.

rgommers commented 4 years ago

all dependent packages are rebuild (also in Bioconda) than you would probably also say that conda-forge is scalable :)

I meant specifically that the conda resolver speed problems depend on the size of the graph. It's still quite easy to run into this even with the nice improvements in conda 4.7.x. So if everything needs to be in conda-forge and the number of packages becomes of the same order as that on PyPI, that may not work. So from that perspective, "everything in conda-forge" seems quite unhealthy. Having channels interact well, like bioconda and conda-forge apparently do, may be much better.

jakirkham commented 4 years ago

Thanks everyone that jumped into this discussion and shared your thoughts. I think this has been extremely valuable. The next step would be to raise some well-scoped issues on the webpage repo for further discussion and resolution. @rgommers, are you happy to do this? 🙂

jakirkham commented 4 years ago

To your question @soumith (though others have offered some great answers too! 😄)

@jakirkham why would you like gpu-built pytorch in conda-forge? We already provide high-quality packages in the pytorch channel and I am really worried about support. Like, whenever a new pytorch version releases, the conda-forge one will be a bit behind and then there will be all kinds of conflicts. I'm putting into context the conversation on torchvision conda-forge repo that happened yesterday.

Sorry I read your previous comment as stating this plan was ok. Was this not what you meant? Or have you changed your mind?

In either case, it seems that various people have pushed to add the pytorch stack to conda-forge. Now we actually have several downstream packages in conda-forge that require pytorch. However what I'm hearing is the user experience is not very good due to the lack of GPU support. Removal would complicate the story for downstream packages that need pytorch. So it seems like the best course of action would be to make sure we have a fully featured pytorch package in conda-forge.

As to maintenance effort, I suspect (though maybe @jjhelmus can comment 😉) that defaults will try to rebase their current work on top of a pytorch package in conda-forge, which will make it easier for our communities to work together and improve the defaults and conda-forge ecosystems collectively.

If you have particular thoughts on how a conda-forge pytorch package can be built, we would appreciate hearing them and would happily incorporate this feedback. In turn if you'd like to continue doing your own build, you can use the recipe we work on together. Alternatively you could reuse the binaries we produce (after whatever validation seems appropriate to you) or encourage users to get the package from conda-forge. In any event, I'd hope you could benefit from this shared effort.

Thoughts? 🙂

rgommers commented 4 years ago

Thanks everyone that jumped into this discussion and shared your thoughts. I think this has been extremely valuable. The next step would be to raise some well-scoped issues on the webpage repo for further discussion and resolution. @rgommers, are you happy to do this? 🙂

Thanks @jakirkham, yes I'll do my best to break this up and create actionable issues. It may take me a little while ....

hmaarrfk commented 4 years ago

Is anybody today depending on pytorch from conda-forge? This package is called pytorch-cpu explicitly to give us time to experiment compiling such a large package without giving users or maintainers the false sense that they are installing a GPU compatible package.

msarahan commented 4 years ago
  1. release sdist and wheels to PyPI, let other packaging teams (Debian, Homebrew, conda-forge, etc.) take it from there

Let me reiterate that the work that is being done here by "other packaging teams" is integration. That integration is not done by PyPI in any meaningful way, and part of what results is the dependency clobbering that you dismiss as "will be fixed one day." Another part of it is any library loading disasters that result from library load order. In an ideal world, auditwheel and machomachomangler take care of things like this, but is this an ideal world?

  1. release to PyPI and conda-forge in parallel myself

Given the bot, this is ideally little work once it is set up. I say "ideally" in the same sense as above with auditwheel and machomachomangler.

  1. release to PyPI and my own conda channel in parallel myself; let conda-forge sync with my channel somehow

This is assuming "my own conda channel" and conda-forge are readily sync-able. The hitch here is that "my own conda channel" may take convenient shortcuts, such as using a newer base system (newer glibc) that makes things easier to build, but also means that conda-forge either needs to do massive work, or that they can't sync.

  1. release to one conda channel only, and let conda-press produce wheels from that that I then upload to PyPI (long-term even nicer, only one build toolchain)

conda-press is ignoring pretty large issues like the C++ ABI difference and the fact that conda packages depend on a shipped libstdc++, not the system. I'd love to see it bridge those gaps, but I fear the fundamental gap between the conda standard compiling approach and the PyPA standard compiling approach may be too much to bridge in all cases.

As a ecosystem-wide contributor, I want to be able to tell users how to easily install and use large parts of the NumPy/PyData ecosystem. Ideally this is something like "download [Ana|Mini]conda, open your IDE of choice and work through these tutorials" followed by "if something is not in the defaults channel, do X". This is harder today than it was 3 years ago ...

Seriously? In the vast majority of cases, conda-forge and defaults are plenty. It's not 100%, and I agree that the edge cases have gotten harder, but this is not an accurate statement.

Would other channels like pytorch be open to using the set of conventions that conda-forge use?

Conventions include the toolchain, such as glibc. pytorch may not be able to adopt these conventions if they have a fundamental need that disagrees with the conda-forge stack. At that point, it becomes a push for either changing the conda-forge toolchain stack (which is in effect implicitly changing defaults' toolchain as well, because we try to stay compatible). This has effects on where conda-forge packages can be safely assumed to run, which you are well aware.

Socially, how does the conda(-forge) community want to be seen and interacted with by projects and maintainers? Like PyPI/pip/wheels, or like Debian?

Conda-forge is all about distributed control. We don't have a central team of integrators. If more feedstocks were maintained by official project leaders, I think conda-forge and the user community would be thrilled. I very much understand if project maintainers just don't want to deal with it, though, and that's where the fallback to a Debian-like model happens.

There are more issues than (1-4 above) with channels. I've brought some up before and @msarahan told me "you are not factoring channels into your thinking enough", but I've never seen a real answer to how channels are supposed to work and how the whole "conda design" fits together. Practical advice has changed over the years (with e.g. --strict-channel-priority and putting conda-forge first in .condarc yes/no).

Channels are the notion of spaces where a coherent team dictates behavior. That team ideally uses a consistent toolchain across all packages in that channel. Package names are consistent within that channel. It is a technical solution to an arguably social problem - lining up practices, versions, and toolchains.

But is there a holistic design or long-term vision that the conda and conda-forge teams share?

Try to maintain binary compatibility with each other while providing both backwards compatibility for the large user base stuck on old enterprise OS and capturing current compute capabilities? We don't get to plan the best approach to that. Things like pytorch are forcing functions for re-evaluation of our toolchain, but it is always in competition with dropping support for old-but-not-EoL platforms.

Conda 4.7's virtual package support might help with this, in that it will allow conda to make solver decisions based on the system state. Currently, it can be used for the cuda driver version installed on the system. It could also be used for the glibc present on the system, and then pytorch could require a particular value for that (or better, the toolchain used could impose that dependency automatically). This allows pytorch to use a newer toolchain while not imposing it on the rest of the channel. This kind of exception to the community standard would probably need some official process, though, or else the channel loses coherency quickly.

Packaging is hard, and there's limited energy/expertise. I want to package things once (like (4) above), and at most twice. With PyTorch it's done four times now: the PyTorch team does PyPI and their conda channel, then there's this feedstock, and pytorch in defaults. Also affects other hard-to-build projects - e.g. we still don't have a conda-forge SciPy package for Windows .....

Keep in mind, they are all targeting different runtime environments. The number of builds can't be any less than the number of different runtime environments.

jph00 commented 4 years ago

Sorry I only just checked my email so didn't see this discussion was happening.

@rgommers thank you so so much for you contributions in this thread. I absolutely couldn't agree more with everything you said, and I'm so grateful you said it. I was starting to wonder if I'm crazy or stupid since it seemed like everyone else was saying I should keep quiet and be happy with the status quo...

Seriously? In the vast majority of cases, conda-forge and defaults are plenty. It's not 100%, and I agree that the edge cases have gotten harder, but this is not an accurate statement.

@msarahan I assume that @rgommers was just speaking of his personal experience. I will add that it's also been my experience. Trying to add a dependency on RapidsAI is extremely challenging to maintain, for instance, since it relies on 5 channels - and that's before we even deal with our own deps.

We just want to let users say conda install fastai and have things work. For now, we are maintaining our own channel to make life easier.

conda-forge is doing a lot of great work and I'm very grateful for it. I'd also like to see recognized that it isn't solving the problem of how to integrate a complex ecosystem like PyTorch's just yet.

isuruf commented 4 years ago

I guess you could see conda-forge and defaults as big package repositories like ubuntu and debian and pytorch as a ppa.

release to PyPI and conda-forge in parallel myself

You can certainly do that. We'd love to have the maintainers of the project become a maintainer in the conda-forge feedstock as you have become.

Packaging is hard, and there's limited energy/expertise. I want to package things once (like (4) above), and at most twice. With PyTorch it's done four times now: the PyTorch team does PyPI and their conda channel, then there's this feedstock, and pytorch in defaults.

There were technical and legal issues with creating GPU packages in conda-forge which have been mostly resolved by @jakirkham. If there's no such issues, then conda-forge and defaults will share the recipes. We need channels like pytorch to be complementary to conda-forge. If so, we can easily share recipes and the maintenance burden will not be much. However when they take a different approach (for eg: GLIBC version), it's not possible.

So if everything needs to be in conda-forge and the number of packages becomes of the same order as that on PyPI, that may not work. So from that perspective, "everything in conda-forge" seems quite unhealthy. Having channels interact well, like bioconda and conda-forge apparently do, may be much better.

I don't understand what you are trying to say here. If there x packages in conda-forge and y packages in bioconda, that's the same amount of work for the conda solver as x+y packages in conda-forge.

Trying to add a dependency on RapidsAI is extremely challenging to maintain, for instance, since it relies on 5 channels - and that's before we even deal with our own deps.

I don't understand you. Here you are arguing that having multiple channels is challenging and @rgommers saying that we should use multiple channels instead of one channel.

rgommers commented 4 years ago

Seriously? In the vast majority of cases, conda-forge and defaults are plenty. It's not 100%, and I agree that the edge cases have gotten harder, but this is not an accurate statement.

@msarahan I assume that @rgommers was just speaking of his personal experience.

Indeed, my own end user experience and the experience of working with lots of beginning to intermediate users. I do note that those users often use the geospatial stack or a deep learning stack, both of which are hard to package. But they're two very large sets of users, certainly not edge cases.

Also, we had a big discussion internally at Quansight on this a while ago. Turns out the majority of experienced users said "I never installing anything into an environment, just destroy and recreate". That is probably good advice right now (it avoids issues with quickly degrading envs), but it points at a major problem. The average user cannot work that way, they will think I'm crazy if I try to explain that.

Finally, I've had to completely remove Anaconda/Miniconda installs several times on more than one machine due to base breaking or becoming so slow that nothing (including conda) could be upgraded anymore.

In my experience, none of these things happened as much 3 years ago. Back then the discussion was only social when I pushed for install instructions that simply said "use Anaconda": can we recommend Anaconda alone, what will happen to pip, what about corporate control, etc. Now that worry has reduced (at least in my perception, largely due to conda-forge), but the usability and technical issues have grown. I think lots of good work is being done, but the conda ecosystem is a bit of a victim of its own success and growth.

jph00 commented 4 years ago

If you have particular thoughts on how a conda-forge pytorch package can be built, we would appreciate hearing them and would happily incorporate this feedback. In turn if you'd like to continue doing your own build, you can use the recipe we work on together. Alternatively you could reuse the binaries we produce (after whatever validation seems appropriate to you) or encourage users to get the package from conda-forge. In any event, I'd hope you could benefit from this shared effort.

Perhaps the right approach is to work with the vendor to ensure their packages are compatible, and then use them, rather than visa versa? No-one understands PyTorch and its packaging needs better than @soumith and team. Also, the PyTorch ecosystem of packages that depends on it (such as ours, fastai) is in regular contact with that team and works closely with them. e.g. see: https://pytorch.org/ecosystem

jph00 commented 4 years ago

I don't understand you. Here you are arguing that having multiple channels is challenging and @rgommers saying that we should use multiple channels instead of one channel.

No, @rgommers said it would be great if there were ways for multiple channels to work well together. He didn't claim it's currently easy.

isuruf commented 4 years ago

Perhaps the right approach is to work with the vendor to ensure their packages are compatible, and then use them, rather than visa versa?

Any volunteers?

jph00 commented 4 years ago

I guess you could see conda-forge and defaults as big package repositories like ubuntu and debian and pytorch as a ppa.

Note that generally a ppa is not recommended for a package on which a large ecosystem depends. This is why distributions like Ubuntu are important. It would be great to see conda-forge take on a similar mantle.

jph00 commented 4 years ago

Any volunteers?

Is this something that the conda-forge project would accept? If so, it sounds like it could be a fantastic direction. But we wouldn't want people to start working on it, if it wouldn't actually get accepted once done.

jph00 commented 4 years ago

BTW I tested the latest PyTorch and fastai with conda-forge over the weekend, and I was able to get it working, at least on Linux (which is where nearly all PyTorch users are). So there mightn't be any technical hurdles to doing this.

isuruf commented 4 years ago

Note that generally a ppa is not recommended for a package on which a large ecosystem depends.

That's exactly the point I was trying to make with @rgommers. Ubuntu repository doesn't have packages that depend on a PPA. When a PPA has good packages, then Ubuntu repository can easily copy those build scripts and bring them over.

jjhelmus commented 4 years ago

The recipe used by Anaconda, Inc to build the pytorch packages in the defaults channel might be of help. It can be found in the pytorch-feedstock folder in the aggregate repository as is based on the recipe used by the PyTorch community. The packages in defaults are build using the "anaconda compilers" and work on distribution with glibc 2.12 (CentOS 6) or newer. If conda-forge had a similar recipe we would be happy to use it as an upstream source as we do for the majority of packages we build.

hmaarrfk commented 4 years ago

Pretty happy to pull in the recipe from defaults.

I think this feedstock predates anything from defaults, and predates me finding things from the official pytorch channel (I didn't look too hard at the time TBH)

jakirkham commented 4 years ago

Thanks @jjhelmus! I'm happy to work with you on this @hmaarrfk. 🙂

jakirkham commented 4 years ago

@soumith, when you get a chance, I'd be curious to hear your thoughts on this comment. Thanks for your engagement here. 🙂

soumith commented 4 years ago

catching up with the long but super valuable thread.

@soumith, when you get a chance, I'd be curious to hear your thoughts on this comment. Thanks for your engagement here.

So, when you asked for my "thoughts on this" on July 11th, I read the subject of the thread and vaguely the first comment, and concluded that you folks were figuring out how to add GPU packages into conda-forge. I didn't realize that it was to add a pytorch GPU package into conda-forge.

@rgommers has captured PyTorch's point of view pretty accurately, but there are some things I'd like to clarify.

Deep Learning packaging is definitely not special, but it does have challenges. I'll explain, but I also want to get to a common working solution for some basic stuff -- "deep learning is special" is not a lazy cop-out for sure.

So, what's going on in deep learning stuff?

  1. A userland obsession with high performance
    • This brings us to the first challenge:
      • always shipping with the latest and greatest CUDA, CuDNN, NCCL
      • shipping with MKL (OpenBLAS is not sufficient) Working with our users and their patience levels, we haven't had any way of slipping on the above. I think conda-forge doesn't link numpy with MKL because of some licensing concerns (correct me if I am wrong). This makes me very nervous, because our users don't understand the fact that PyTorch is slower than X because of a packaging issue. They simply assume the worst. I've had multiple release cycles where we're on a tight timeline and we couldn't afford to wait for official Anaconda channel to upgrade CuDNN or NCCL, so we do ugly and embarrassing things like statically link to the most performant (and bug-free) CuDNN version ourselves.
  2. Exotic hardware -- NVIDIA GPUs are only the start
    • We are shipping TPU support next week, and we have experimental AMD ROCm support on master. Plenty of other processors and hardware has to be supported in the future
    • With this future being evident, relying on conda-forge or defaults for official supported packages (where we are on hook to get every user's situation working) makes me very very nervous. My worry is that once we switch over to conda-forge being official, if we ever need to turn back because of some exotic constraint, that'll create hard fragmentation in the packaging ecosystem. We've been dealing with fragile packaging as-is, and I don't see our packaging future for PyTorch getting more stable because of all these hardware / driver stacks that we'll have to deal with the the future / near future.

All that being said, I don't think we face unique challenges in the ecosystem, we have friends now. The RAPIDS folks deal with almost the same stuff, and I think there are at least two points of reconciliation that need to be done for the heath of everyone.

  1. Rapids, PyTorch etc. standardize on all of the common GPU library stack, and help push the latest dependency libraries to go live on conda within a week of release (cublas, cudnn, nccl, etc.).
  2. Figure out if we can mirror on conda-forge (and maintain that), and make reasonable changes to our official packaging to make that commitment.

I think (1) is easy and obvious, if we (pytorch packagers and rapids packagers and nvidia and jjhelmus) work out a process, we can follow it.

(2) is a scale of things. I actually haven't figured out how feasible (2) would be. For example, would it be feasible to mirror to conda-forge until 1.5, but then say 1.6 will only appear in the pytorch channel for X months because of some hard-to-solve constraint that takes time and money and people? I am not sure.

hmaarrfk commented 4 years ago

I would say that we would be very open to having a dual distribution model. Where the software would be simultaneously available on both conda forge and pytorch at the same time. your concerns about version management and stability are very real.

I think that people moving to TPUs are very likely be to good at following instructions, so whatever you would include in terms of warnings and documentation they would listen to.

Statically linking is a little bit of counter trend here. I wonder if I can be solved by setting the minimum required version of cudnn in the metal file and rebuilding frequently.

isuruf commented 4 years ago

Working with our users and their patience levels, we haven't had any way of slipping on the above. I think conda-forge doesn't link numpy with MKL because of some licensing concerns (correct me if I am wrong). This makes me very nervous, because our users don't understand the fact that PyTorch is slower than X because of a packaging issue. They simply assume the worst.

Recently conda-forge moved to a model like debian's update-alternatives and therefore numpy in conda-forge can use either MKL or OpenBLAS or BLIS. (Or any other BLAS implementation that we may decide to support). If you want to, you can force MKL as a dependency for a particular package. Note that we are not providing MKL. We just use the package that defaults does.

I've had multiple release cycles where we're on a tight timeline and we couldn't afford to wait for official Anaconda channel to upgrade CuDNN or NCCL, so we do ugly and embarrassing things like statically link to the most performant (and bug-free) CuDNN version ourselves.

conda-forge doesn't ship CuDNN either. defaults has gotten permission to distribute. If anybody wants to help getting permission then they can talk with conda-forge/core and NVIDIA about doing this.

we have experimental AMD ROCm support on master.

If you are talking about ROCm OpenCL, we have good support for OpenCL in conda-forge with Pocl, Neo and Oclgrind shipped and the ability to link to nvidia opencl. I'm interested in shipping ROCm OpenCL (and other tools). Any help is welcome here.

ezyang commented 4 years ago

If you are talking about ROCm OpenCL, we have good support for OpenCL in conda-forge with Pocl, Neo and Oclgrind shipped and the ability to link to nvidia opencl. I'm interested in shipping ROCm OpenCL (and other tools).

Just a drive by clarification: PyTorch's AMD support uses HIPify to cross-compile our CUDA code into HIP code. There is no OpenCL in our integration

isuruf commented 4 years ago

Great. I'm mainly interested in the OpenCL part, but would like to see the entire ROCm stack on conda-forge.

isuruf commented 4 years ago

@soumith, @ezyang, ROCM core components and a few of development tools are in conda-forge now. https://github.com/conda-forge/staged-recipes/issues/10123 . You can just conda install these packages on a linux system running a recent kernel version and a supported AMD GPU. They are still experimental, so any testing is appreciated.

henryiii commented 4 years ago

Is this affected by the recent {{ compiler("cuda") }} that cupy-feedstock uses? I'd like to see a conda-forge PyTorch and TensorFlow for the GPU along with the GPU compatible CuPy (which is fantastic to have).

rgommers commented 4 years ago

@henryiii there is a plan now for a GPU enabled PyTorch package on conda-forge, supported by the PyTorch team. An update on that plan will follow soon.