build(deps): update torch requirement from <2.1.0,>=1.8.1 to >=1.8.1,<2.5.0

Updates the requirements on torch to permit the latest version.

Release notes

PyTorch 2.4: Python 3.12, AOTInductor freezing, libuv backend for TCPStore

PyTorch 2.4 Release Notes

Highlights

Tracked Regressions

Backward incompatible changes

Deprecations

New features

Improvements

Bug Fixes

Performance

Documentation

Developers

Security

Highlights

We are excited to announce the release of PyTorch® 2.4! PyTorch 2.4 adds support for the latest version of Python (3.12) for torch.compile. AOTInductor freezing gives developers running AOTInductor more performance based optimizations by allowing the serialization of MKLDNN weights. As well, a new default TCPStore server backend utilizing libuv has been introduced which should significantly reduce initialization times for users running large-scale jobs. Finally, a new Python Custom Operator API makes it easier than before to integrate custom kernels into PyTorch, especially for torch.compile.

This release is composed of 3661 commits and 475 contributors since PyTorch 2.3. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.4.

Tracked Regressions

Subproc exception with torch.compile and onnxruntime-training

There is a reported issue when using torch.compile if onnxruntime-training lib is installed. The issue will be fixed in v2.4.1. It can be solved locally by setting the environment variable TORCHINDUCTOR_WORKER_START=fork before executing the script.

cu118 wheels will not work with pre-cuda12 drivers

It was also reported that the new version of triton uses cuda features that are not compatible with pre-cuda12 drivers. In this case, the workaround is to set TRITON_PTXAS_PATH manually as follows (adapt the code according to the local installation path):
TRITON_PTXAS_PATH=/usr/local/lib/python3.10/site-packages/torch/bin/ptxas  python script.py
Backwards Incompatible Change

Python frontend

Default TreadPool size to number of physical cores (#125963)

Changed the default number of threads used for intra-op parallelism from the number of logical cores to the number of

... (truncated)

Changelog

Sourced from torch's changelog.

Releasing PyTorch

Release Compatibility Matrix

Release Cadence

General Overview

Frequently Asked Questions

Cutting a release branch preparations

Cutting release branches

pytorch/pytorch

pytorch/builder / PyTorch domain libraries

Making release branch specific changes for PyTorch

Making release branch specific changes for domain libraries

Running Launch Execution team Core XFN sync

Drafting RCs (https://github.com/pytorch/pytorch/blob/main/Release Candidates) for PyTorch and domain libraries

Release Candidate Storage

Release Candidate health validation

Cherry Picking Fixes

How to do Cherry Picking

Cherry Picking Reverts

Preparing and Creating Final Release candidate

Promoting RCs to Stable

Additional Steps to prepare for release day

Modify release matrix

Open Google Colab issue

Patch Releases

Patch Release Criteria

Patch Release Process

Patch Release Process Description

Triage

Issue Tracker for Patch releases

Building a release schedule / cherry picking

Building Binaries / Promotion to Stable

Hardware / Software Support in Binary Build Matrix

Python

Accelerator Software

Special support cases

Operating Systems

Submitting Tutorials

Special Topics

Updating submodules for a release

Triton dependency for the release

Release Compatibility Matrix

Following is the Release Compatibility Matrix for PyTorch releases:

... (truncated)

Commits

d990dad [CMAKE] Look for Development.Module instead of Development (#129729)
e4ee3be [Release only] use triton 3.0.x from pypi (#130336)
9afe4ec Update torchbench model expected accuracy values after pinning numpy (#129986)
499621e [CherryPick][FSDP2+TP] Disable 2D state_dict (#129519) (#129923)
e5bda62 [CherryPick][DCP] Fix Optimizer Learning Rate not being loaded correctly (#12...
705e3ae Improve error message for weights_only load (#129783)
b26cde4 [Windows] remove mkl shared library dependency. (#129740)
12ad767 [distributed] NCCL result code update (#129704)
1164d3c Add threadfence to 2-stage reduction for correct writes visibility (#129701)
9533637 Inductor to fail gracefully on Voltas for bf16 tensors (#129699)
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Lightning-AI / tutorials

build(deps): update torch requirement from <2.1.0,>=1.8.1 to >=1.8.1,<2.5.0 #347

PyTorch 2.4: Python 3.12, AOTInductor freezing, libuv backend for TCPStore

PyTorch 2.4 Release Notes

Highlights

Tracked Regressions

Subproc exception with torch.compile and onnxruntime-training

cu118 wheels will not work with pre-cuda12 drivers

Backwards Incompatible Change

Python frontend

Default `TreadPool` size to number of physical cores (#125963)

Releasing PyTorch

Release Compatibility Matrix