conda-forge / pytorch-cpu-feedstock

A conda-smithy repository for pytorch-cpu.
BSD 3-Clause "New" or "Revised" License
18 stars 49 forks source link

GPU Support #7

Closed hmaarrfk closed 3 years ago

hmaarrfk commented 5 years ago

@jjhelmus it seems you were able to build pytorch GPU without needing to have variants

https://anaconda.org/anaconda/pytorch/files?version=1.0.1

Is that true?

If so, what challenges do you see moving this work to conda-forge?

jph00 commented 4 years ago

@rgommers that's great - when you provide that update, could you please include information about how the issues you raised earlier in the thread will be addressed, and what outstanding issues you see? It would be really helpful for those of us packaging downstream libs to understand what recommended best practices are, including what we should tell our users about how to update downstream packages.

(In particular, we still haven't managed to create a package that depends on RapidsAI, and can be reliably installed and updated by users. So I guess our biggest unsolved issue is how to be a downstream lib of a downstream lib!...)

h-vetinari commented 4 years ago

@rgommers: @henryiii there is a plan now for a GPU enabled PyTorch package on conda-forge, supported by the PyTorch team. An update on that plan will follow soon.

Any update on this?

mattip commented 4 years ago

Pytorch is now using a conda toolchain to build and test from a docker image in CI. This was done in the PRs that closed pytorch/pytorch#37584.

hmaarrfk commented 4 years ago

I'm trying to pull in the recipe from defaults in https://github.com/conda-forge/pytorch-cpu-feedstock/pull/20

Help would be appreciated.

If you need rights to my repo, please let me know.

I think a good plan would be to:

  1. Build the pytorch-cpu correctly
  2. Build the pytorch-gpu correctly.
  3. Build the unified pytorch package

@soumith I really understand your concern in:

PyTorch is slower than X because of a packaging issue. They simply assume the worst.

However, very recent anecdotal evidence shows that at the 50% performance level, users simply aren't concerned with this kind of penalty so long as they can get their stack installed. The network effect of conda-forge makes it super valuable in getting packages from subfields of machine learning installed, especially for those that defaults and pytorch don't have time to package.

rgommers commented 4 years ago

I failed to summarize the long meeting and notes we had on this (apologies) and some things changed in conda-forge in the meantime that change the details of what we had discussed in the call, but the right approach here is still syncing the binaries built by the PyTorch team in the pytorch channel to conda-forge. We're working on this (slowly). Adding CI to PyTorch to build with a Conda toolchain by @mattip was part of that. Then @scopatz is making the change to cpuonly and gpu mutex packages in https://github.com/pytorch/builder/pull/488, and will work on the next steps for getting PyTorch including GPU support on conda-forge.

@hmaarrfk if you'd want to help out with moving that forward, that seems healthier than keep pushing this package.

However, very recent anecdotal evidence shows that at the 50% performance level, users simply aren't concerned with this kind of penalty so long as they can get their stack installed.

This claim is definitely not true. some users aren't concerned, but there are a lot of PyTorch bug reports about "torch.<somefunc> is slower now than in older release 1.x.y".

The network effect of conda-forge makes it super valuable in getting packages from subfields of machine learning installed, especially for those that defaults and pytorch don't have time to package.

If you have a set of those, maybe they should simply go in their own channel for the time being, which depends on both conda-forge and pytorch channels? Name it pytorch-contrib or something like that?

isuruf commented 4 years ago

but the right approach here is still syncing the binaries built by the PyTorch team in the pytorch channel to conda-forge

This is certainly not the right approach. I don't see why pytorch is special. We should just build them on conda-forge. Benefits of building them on conda-forge is that we know that they are compatible with the rest of the stack. Bots are there to do maintenance and rebuild with latest downstream packages. Also conda-forge builds on more architectures.

@hmaarrfk, building them on conda-forge is totally fine with me.

rgommers commented 4 years ago

This is certainly not the right approach. I don't see why pytorch is special. We should just build them on conda-forge.

You don't even have GPU build hardware, right? There are more reasons, I hope @scopatz can summarize when he gets back; he said the exact same thing as a "conda-forge first principles" type response, but I believe I managed to convince him.

isuruf commented 4 years ago

You don't even have GPU build hardware, right?

We have a docker image with the compilers. Hardware is not needed to build AFAIK. After building the package, we can upload to a testing label and then move the package to main after doing testing on a local machine with the hardware.

hmaarrfk commented 4 years ago

I mean, the fact that Anaconda has already published conda packages to https://anaconda.org/anaconda/pytorch about 6 months ago mostly invalidates whatever arguments we wish to have about control.

We are quite similar to defaults since a recent sync, so I think it is reasonable to ask that we collaborate instead of diverge.

hmaarrfk commented 4 years ago

And for reference, here is a pointer to the installation instructions of the pytorch family package i was talking about https://github.com/pytorch/fairseq#requirements-and-installation

I understand there isn't always an immediate business case (at Facebook or Continuum) to create a high quality package for everything, which is where conda-forge comes in.

rgommers commented 4 years ago

We have a docker image with the compilers. Hardware is not needed to build AFAIK. After building the package, we can upload to a testing label and then move the package to main after doing testing on a local machine with the hardware.

Just doing some manual testing seems like a recipe for broken packages. And you probably won't be able to test everything that way (e.g. multi-GPU stuff with torch.distributed). The battery of tests for PyTorch with various hardware and build configs is very large, and it's very common to have just some things break that you never saw locally.

And for reference, here is a pointer to the installation instructions of the pytorch family package i was talking about https://github.com/pytorch/fairseq#requirements-and-installation

That's one package, and it has a "help wanted" issue for a conda package: https://github.com/pytorch/fairseq/issues/1717. Contributing there and getting a first conda package into the pytorch channel seems like a much better idea than doing your own thing. You can then also use the CI system, so you can test the builds, and I'd imagine you get review/help from the fairseq maintainers.

isuruf commented 4 years ago

Just doing some manual testing seems like a recipe for broken packages. And you probably won't be able to test everything that way (e.g. multi-GPU stuff with torch.distributed). The battery of tests for PyTorch with various hardware and build configs is very large, and it's very common to have just some things break that you never saw locally.

How is this different from other packages like numpy, openblas, etc.?

rgommers commented 4 years ago

How is this different from other packages like numpy, openblas, etc.?

For NumPy you actually run the tests. Examples:

Plus the number of ways to build NumPy is far smaller than with PyTorch (e.g., check the number of USE_xxx env vars in PyTorch's setup.py). So, it's very different.

isuruf commented 4 years ago

I'm suggesting we run the tests on a local machine with GPU hardware.

We don't test all the code paths in numpy. For eg: there's AVX512 code paths that we don't test. We don't test POWER9 code paths. It's impossible to test all code paths.

isuruf commented 4 years ago

Plus the number of ways to build NumPy is far smaller than with PyTorch

There are lots of different ways to build openblas. See how many options we set in https://github.com/conda-forge/openblas-feedstock/blob/master/recipe/build.sh#L26-L50

h-vetinari commented 4 years ago

I have to agree that local testing is a poor substitute for a proper CI matrix, but of course that's not possible without a CI having GPUs, see https://github.com/conda-forge/conda-forge.github.io/issues/1062 - considering the impact conda-forge is having on the scientific computing stack in python, one would hope this should be a tractable problem... (note the OP of that issue; it might be possible to hook in self-hosted machines into the regular azure CI).

With a concerted (and a bit more high-level) effort, I believe that it might be realistically possible to convince Microsoft to sponsor the python packaging ecosystem with some GPU CI time on azure, but admittedly, that's just in my head (based on some loose but very positive interaction I had with their reps).

Re: "build coverage" - 100% might not be possible, but one can get pretty close, depending on the invested CI time. For example, even if we can now have 3-4 different CPU builds per platform/python-version/blas-version (via https://github.com/conda/conda/pull/9930), it's still "only" a question of CI time to multiply the matrix of (e.g.) conda-forge/numpy-feedstock#196 by 3-4. For packages as fundamental as numpy/scipy, this is IMO worth the effort. Pytorch could fall into that category as well.

isuruf commented 4 years ago

I have to agree that local testing is a poor substitute for a proper CI matrix

How is it different if we run the tests in CI or locally before uploading to main label?

h-vetinari commented 4 years ago

How is it different if we run the tests in CI or locally before uploading to main label?

Reproducing a full matrix of combinations locally (different Arches/OSes/python version/GPUs/CPUs/etc.) is not fundamentally impossible to do locally (I just said "poor substitute"), but would take a huge amount of time (incl. complicated virtualization setup for other OSes/Arches), and be error-prone & intransparent, compared to CI builds that run in parallel and can easily be inspected.

isuruf commented 4 years ago

Can we please stay on topic? @rgommers wants to copy binaries from pytorch channel which is not definitely not transparent nor can it be easily inspected.

isuruf commented 4 years ago

If anyone wants to talk more on this, please come to a core meeting.

h-vetinari commented 4 years ago

I'm all for building in conda-forge BTW, just saying that I can see the argument why this shouldn't come at the cost of reduced (GPU-)CI coverage (and hence bringing up the GPU-in-CF-CI thing, which would enable to kill both birds with one stone).

hadim commented 4 years ago

For what it's worth I am also interested in pytorch on conda-forge (with cuda and no-cuda support). In addition to all the advantages cited above, it would allow compiling against pytorch for conda packages.

Copying binaries is fine for me (I am being pragmatic here) but as probably everyone here I would largely prefer to have those packages directly build on conda forge.

hmaarrfk commented 3 years ago

For those of you involved in packaging pytorch, we are interested in pushing https://github.com/conda-forge/pytorch-cpu-feedstock/pull/22

through with the package name pytorch creating direct competition with pytorch's advertised pytorch package on their website.

Now that conda-forge supports GPUs, I Think it is safe for us to do so.

If there are any other reasons that should be brought up at this stage, please let us know in the PR .

Thanks for all your input so far!

rgommers commented 3 years ago

Now that conda-forge supports GPUs

Is the current status documented somewhere? I found https://github.com/conda-forge/conda-forge.github.io/issues/901 as the TODO item to write docs, maybe there's something else?

hmaarrfk commented 3 years ago

the pull request #22 is probably the best current documentation on how to use it :D

seemethere commented 3 years ago

Hello! I'm from the release engineering team for PyTorch. Please let us know if there's any way we can assist in making the conda-forge installation experience for pytorch as smooth as possible.

cc @malfet

isuruf commented 3 years ago

@seemethere, thanks for the offer. One task you could help with is a way to collect the licenses/copyright notices of the third party dependencies to comply with their license terms.

henryiii commented 3 years ago

There's not a lot of documentation AFAIK, but https://github.com/conda-forge/goofit-split-feedstock/blob/master/recipe/meta.yaml is an example of a split GPU / CPU package.

isuruf commented 3 years ago

@seemethere, thanks for the offer. One task you could help with is a way to collect the licenses/copyright notices of the third party dependencies to comply with their license terms.

any updates on this?

hmaarrfk commented 3 years ago

Closing this issue as the original issue has been resolved.

I opened #34 to discuss licensing.