Update 2.1.0 again - Githubissues

hmaarrfk commented 10 months ago

@jakirkham the same error happened again.

The CI should show it too:

  -- Found Threads: TRUE
  CMake Error at cmake/public/cuda.cmake:65 (message):
    Found two conflicting CUDA installs:

    V12.0.76 in
    '/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1699320420794/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placeh/targets/x86_64-linux/include'
    and

    V12.0.76 in
    '/home/conda/feedstock_root/build_artifacts/pytorch-recipe_1699320420794/_build_env/targets/x86_64-linux/include'
  Call Stack (most recent call first):
    cmake/Dependencies.cmake:44 (include)
    CMakeLists.txt:722 (include)

Checklist

[ ] Used a personal fork of the feedstock to propose changes
[ ] Bumped the build number (if the version is unchanged)
[ ] Reset the build number to 0 (if the version changed)
[ ] Re-rendered with the latest conda-smithy (Use the phrase code>@<space/conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
[ ] Ensured the license file is being packaged.

conda-forge-webservices[bot] commented 10 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

jakirkham commented 10 months ago

Thanks Mark! 🙏

It appears this is due to a change introduced in PyTorch 2.1.0's own CTK detection logic. Unfortunately this check is problematic

In the Conda case, we implement a splayed layout. This means build tools (like those listed in requirements/build) live in one path and libraries that are linked to (like those listed in requirements/host) live in another path. This is common when supporting cross-compilation as we do with Conda. However this means there are cases where we do need to use some things from both paths for different reasons

It looks like @ax3l ran into this issue in the HPC SDK use case and proposed a fix ( https://github.com/pytorch/pytorch/pull/108932 ). Idk if that will work for us

In terms of our needs here, maybe just patching the check out altogether would be reasonable to get the build working

Perhaps this would be a good opportunity to discuss with @peterbell10 whether we can come up with a better check in PyTorch that works for splayed layout use cases

hmaarrfk commented 10 months ago

ok lets try again, i took a brute force approach because we can do a bit of meta building

hmaarrfk commented 10 months ago

It now fails with:

  CMake Error at cmake/public/cuda.cmake:64 (message):
    Failed to find nvToolsExt
  Call Stack (most recent call first):
    cmake/Dependencies.cmake:44 (include)
    CMakeLists.txt:722 (include)

jakirkham commented 10 months ago

Thanks Mark! 🙏

Think that refers to this check added in the same PyTorch PR

As noted under the CUDA::nvToolsExt doc in CMake, this is a deprecated target by CMake (and NVIDIA) as it comes from NVTX 2, NVTX 3 has superseded it

Am guessing PyTorch doesn't use NVTX 2 (as this check was new in that PR). Meaning that this was purely a build configuration check

So think we can remove those lines as well

hmaarrfk commented 10 months ago

what about this: https://github.com/pytorch/pytorch/pull/82695/files#diff-8e5cb190cc46be808993381a31fe9c027705d356b6bc0460368c0310ae82b273R218

hmaarrfk commented 10 months ago

can i remove it too?

hmaarrfk commented 10 months ago

ok well i pushed my changes, feel free to push anything if you can. going to slee....

jakirkham commented 10 months ago

Thanks Mark! 🙏

Have a good night

jakirkham commented 10 months ago

It looks like Peter added a fix upstream ( https://github.com/pytorch/pytorch/pull/113174 ). Thanks Peter! 🙏

Maybe we can give that a try

hmaarrfk commented 10 months ago

it doesn't address the nvtools issue

Tobias-Fischer commented 10 months ago

There are two upsteam PRs that solve the nvtools issue; I am not sure which one is preferable. See https://github.com/pytorch/pytorch/issues/101135 and PRs https://github.com/pytorch/pytorch/pull/97582 and https://github.com/pytorch/pytorch/pull/106763

hmaarrfk commented 10 months ago

ok builds are incoming:

My test was to use

import torch
a = torch.randn(1024 * 1024 * 1024, device='cuda')
a + 1

and watch the memory on my cuda device grow using nvtop

jakirkham commented 10 months ago

Nice work Mark! 🥳

@peterbell10 do the patches here look upstreamable to you? Or are there similar approaches that upstream could take that would alleviate the need for these?

hmaarrfk commented 10 months ago

log_files.zip

jakirkham commented 10 months ago

Were Linux ARM packages built as well?

Edit: Nvm I see them 🤦‍♂️

conda-forge / pytorch-cpu-feedstock

Update 2.1.0 again #203