Pytorch 2.2.1 and libabseil 20240116, libgrp 1.61, libprotobuf 4.25.2

regro-cf-autotick-bot commented 8 months ago

This PR has been triggered in an effort to update libabseil20240116_libgrpc161_libprotobuf4252.

Notes and instructions for merging this PR:

Please merge the PR only after the tests have passed.
Feel free to push to the bot's branch to update this PR if needed.

Please note that if you close this PR we presume that the feedstock has been rebuilt, so if you are going to perform the rebuild yourself don't close this PR until the your rebuild has been merged.

If this PR was opened in error or needs to be updated please add the bot-rerun label to this PR. The bot will close this PR and schedule another one. If you do not have permissions to add this label, you can use the phrase code>@<space/conda-forge-admin, please rerun bot in a PR comment to have the conda-forge-admin add it for you.

_{This PR was created by the regro-cf-autotick-bot. The regro-cf-autotick-bot is a service to automatically track the dependency graph, migrate packages, and propose package version updates for conda-forge. Feel free to drop us a line if there are any issues! This PR was generated by https://github.com/regro/cf-scripts/actions/runs/8009280365, please use this URL for debugging.}

conda-forge-webservices[bot] commented 8 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

It looks like the 'libtorch' output doesn't have any tests.

h-vetinari commented 8 months ago

Do we want to combine this with the just-released 2.2.1, or do them separately (and in what order)?

@conda-forge/pytorch-cpu

h-vetinari commented 8 months ago

Looks like the generic builds passed, and the MKL+CUDA builds failed, though for varying reasons.

For CUDA 11.2+MKL, holy maccaroni, that's a lot of template spew - more than 7000 lines for one(!!!) CUDA object:

[5408/5646] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/group_norm_kernel.cu.o
[...]
$SRC_DIR/third_party/cutlass/include/cute/stride.hpp(299): error: template instantiation resulted in unexpected function type of "cute::C<32> (cute::C<1>, cute::C<32>)" (the meaning of a name may have changed since the template declaration -- the type of the template is "cute::C<<expression>> (cute::C<t>, cute::C<u>)")
          detected during:
            instantiation of "cute::operator*" based on template arguments <1, 32>
(299): here
            instantiation of "auto cute::detail::compact<Major,Shape,Current>(const Shape &, const Current &) [with Major=cute::LayoutLeft, Shape=cute::_32, Current=cute::Int<1>]"
(351): here
            instantiation of "auto cute::compact_major<Major,Shape,Current,<unnamed>>(const Shape &, const Current &) [with Major=cute::LayoutLeft, Shape=cute::_32, Current=cute::Int<1>, <unnamed>=(void *)nullptr]"
(363): here
            instantiation of type "cute::LayoutLeft::Apply<cute::_32>"
$SRC_DIR/third_party/cutlass/include/cute/layout.hpp(83): here
            processing of template argument list for "cute::Layout" based on template argument <cute::_32>
$SRC_DIR/third_party/cutlass/include/cute/atom/copy_traits_sm75.hpp(45): here
[...]
[5409/5646] Building CUDA object caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/LogcumsumexpKernel.cu.o
ninja: build stopped: subcommand failed.

For CUDA 11.8+MKL, I cannot seem to access the full logs, probably because it's still shown as yellow:

I downloaded the zip too, it doesn't contain the "Build on Linux" step...

CUDA 12+MKL failed due to some error (very likely flaky) during git checkout.

hmaarrfk commented 8 months ago

11.2 might need to be dropped. Unless somebody wants to maintain it. I don' t have a particular need.

11.8 builds locally for me.

12.0 might need a patch update (or just removing them since they might hvae been merge upstream). Trying again locally.

h-vetinari commented 8 months ago

Seems the builds here just got OOM-killed (running into what looks like https://github.com/Quansight/open-gpu-server/issues/28)? Can we reduce parallelism somewhat?

hmaarrfk commented 8 months ago

I'm kind worried we won't have pytorch and tensorflow co-installable. is that a valid concern? not sure how to get around the TF compilations.

hmaarrfk commented 8 months ago

We could change the line:

https://github.com/regro-cf-autotick-bot/pytorch-cpu-feedstock/blob/rebuild-libabseil20240116_libgrpc161_libprotobuf4252-0-1_h3fce2c/recipe/build.sh#L87

But i'm hoping there is a way in the job configuration to set the variable instead. I would like to retain the ability to use more than 4 cores locally.

h-vetinari commented 8 months ago

I'm kind worried we won't have pytorch and tensorflow co-installable. is that a valid concern? not sure how to get around the TF compilations.

Tensorflow has been lagging pytorch regularly w.r.t. migrations. I'm hoping things get better once we're fully on cirun also for TF. The current abseil problems don't help, but it's been like that for a while... 😑

conda-forge-webservices[bot] commented 8 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

It looks like the 'libtorch' output doesn't have any tests.
'Test On Native Only' is deprecated. This was used for disabling testing for cross-compiling.

This has been deprecated in favor of the top-level `test` field.
It is now mapped to `test: native_and_emulated`.

        Failed validating 'deprecated' in schema['properties']['test_on_native_only']:
            {'anyOf': [{'type': 'boolean'}, {'type': 'null'}],
             'default': False,
             'deprecated': True,
             'description': 'This was used for disabling testing for '
                            'cross-compiling.\n'
                            '\n'
                            '```warning\n'
                            'This has been deprecated in favor of the top-level '
                            '`test` field.\n'
                            'It is now mapped to `test: native_and_emulated`.\n'
                            '```',
             'title': 'Test On Native Only'}

        On instance['test_on_native_only']:
            True

conda-forge-webservices[bot] commented 8 months ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

It looks like the 'libtorch' output doesn't have any tests.

h-vetinari commented 8 months ago

@jaimergp @isuruf, it seems we're regularly blowing through the memory here even with only 2 workers. I guess it would make sense to teach smithy how to set up cirun to set up swap files (cf. swapfile_size for azure)?

That reminds me, we also never followed up on a unified design for swapfiles in smithy... 🤔

regro-cf-autotick-bot commented 6 months ago

Due to the bot-rerun label I'm closing this PR. I will make another one as appropriate. This message was generated by https://github.com/regro/cf-scripts/actions/runs/9010926933 - please use this URL for debugging.

conda-forge / pytorch-cpu-feedstock

Pytorch 2.2.1 and libabseil 20240116, libgrp 1.61, libprotobuf 4.25.2 #221