Bump torch from 2.0.1 to 2.1.1

Bumps torch from 2.0.1 to 2.1.1.

Release notes

PyTorch 2.1.1 Release, bug fix release

This release is meant to fix the following issues (regressions / silent correctness):

Remove spurious warning in comparison ops (#112170)

Fix segfault in foreach_* operations when input list length does not match (#112349)

Fix cuda driver API to load the appropriate .so file (#112996)

Fix missing CUDA initialization when calling FFT operations (#110326)

Ignore beartype==0.16.0 within the onnx package as it is incompatible (#111861)

Fix the behavior of torch.new_zeros in onnx due to TorchScript behavior change (#111694)

Remove unnecessary slow code in torch.distributed.checkpoint.optimizer.load_sharded_optimizer_state_dict (#111687)

Add planner argument to torch.distributed.checkpoint.optimizer.load_sharded_optimizer_state_dict (#111393)

Continue if param not exist in sharded load in torch.distributed.FSDP (#109116)

Fix handling of non-contiguous bias_mask in torch.nn.functional.scaled_dot_product_attention (#112673)

Fix the meta device implementation for nn.functional.scaled_dot_product_attention (#110893)

Fix copy from mps to cpu device when storage_offset is non-zero (#109557)

Fix segfault in torch.sparse.mm for non-contiguous inputs (#111742)

Fix circular import between Dynamo and einops (#110575)

Verify flatbuffer module fields are initialized for mobile deserialization (#109794)

The pytorch/pytorch#110961 contains all relevant pull requests related to this release as well as links to related issues.

PyTorch 2.1: automatic dynamic shape compilation, distributed checkpointing

PyTorch 2.1 Release Notes

Highlights

Backwards Incompatible Change

Deprecations

New Features

Improvements

Bug fixes

Performance

Documentation

Developers

Security

Highlights

We are excited to announce the release of PyTorch® 2.1! PyTorch 2.1 offers automatic dynamic shape support in torch.compile, torch.distributed.checkpoint for saving/loading distributed training jobs on multiple ranks in parallel, and torch.compile support for the NumPy API.

In addition, this release offers numerous performance improvements (e.g. CPU inductor improvements, AVX512 support, scaled-dot-product-attention support) as well as a prototype release of torch.export, a sound full-graph capture mechanism, and torch.export-based quantization.

Along with 2.1, we are also releasing a series of updates to the PyTorch domain libraries. More details can be found in the library updates blog.

This release is composed of 6,682 commits and 784 contributors since 2.0. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.1. More information about how to get started with the PyTorch 2-series can be found at our Getting Started page.

Summary:

torch.compile now includes automatic support for detecting and minimizing recompilations due to tensor shape changes using automatic dynamic shapes.

torch.distributed.checkpoint enables saving and loading models from multiple ranks in parallel, as well as resharding due to changes in cluster topology.

torch.compile can now compile NumPy operations via translating them into PyTorch-equivalent operations.

torch.compile now includes improved support for Python 3.11.

New CPU performance features include inductor improvements (e.g. bfloat16 support and dynamic shapes), AVX512 kernel support, and scaled-dot-product-attention kernels.

torch.export, a sound full-graph capture mechanism is introduced as a prototype feature, as well as torch.export-based quantization.

... (truncated)

Commits

4c55dc5 remove _shard_tensor() call (#111687)
f58669b c10::DriverAPI Try opening libcuda.so.1 (#113096)
33106b7 [DCP] Add test for planner option for load_sharded_optimizer_state_dict (#11...
4b4c012 Enable planner to be used for loading sharded optimizer state dict (#112520)
47ac502 [DCP][test] Make dim_0 size of params scale with world_size in torch/distribu...
dc96ecb Fix mem eff bias bug (#112673) (#112796)
18a2ed1 Mirror of Xformers Fix (#112267) (#112795)
b2e1277 Fix the meta func for mem_eff_backward (#110893) (#112792)
b249946 [Release-only] Pin Docker images to 2.1 for release (#112665)
ee79fc8 Revert "Fix bug: not creating empty tensor with correct sizes and device. (#1...
Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

RenderKit / oidn

Bump torch from 2.0.1 to 2.1.1 #185

PyTorch 2.1.1 Release, bug fix release

PyTorch 2.1: automatic dynamic shape compilation, distributed checkpointing

PyTorch 2.1 Release Notes

Highlights