Changelog
### 1.5.0
```
class Exp(torch.autograd.Function):
staticmethod
def forward(ctx, i):
result = i.exp()
ctx.save_for_backward(result)
return result
staticmethod
def backward(ctx, grad_output):
result, = ctx.saved_tensors
return grad_output * result
Exp.apply(torch.tensor(1.))
`torch.optim` optimizers changed to fix in-place checks for the changes made by the optimizer ([33640](https://github.com/pytorch/pytorch/pull/33640), [34211](https://github.com/pytorch/pytorch/pull/34211))
If this causes your code to fail, there are two possible reasons:
Reason 1: The value of that parameter was actually saved and used and we were computing incorrect gradients in previous versions of PyTorch. This would result in an error message mentioning incorrect version numbers. You should replace code that uses `self.my_param` by `self.my_param.clone()` to make sure the saved version is different from the one that is modified by the optimizer. For example:
Before 1.5.0, the following may have worked.
def model(input, target, param):
return `(input * param ** 2 - target).norm()`
param = torch.randn(2, requires_grad=True)
input = torch.randn(2)
target = torch.randn(2)
sgd = optim.SGD([param], lr=0.001)
loss = model(input, target, param)
loss.backward(retain_graph=True)
sgd.step()
loss.backward()
param.grad
If after upgrading to 1.5.0, the above fails due to a version counter error, then that means the gradient computed was incorrect. To remedy this, clone `param` before using it in the model:
def model(input, target, param):
return (input * param ** 2 - target).norm()
param = torch.randn(2, requires_grad=True)
input = torch.randn(2)
target = torch.randn(2)
sgd = optim.SGD([param], lr=0.001)
loss = model(input, target, param.clone())
loss.backward(retain_graph=True)
sgd.step()
loss.backward()
param.grad
Reason 2: You know what you're doing and change the values back to the right thing before the next backward. However, you're running into an error because the version counter cannot be decremented. Open an issue with your particular use case and we will help you to work around the version counter issue.
`utils.cpp_extensions` now use `ninja` as the default compilation backend ([32495](https://github.com/pytorch/pytorch/pull/32495))
`ninja` enables parallel compilation of your C++ extension, greatly speeding up compilation. This change will not break most user code; if you do not have `ninja` installed, we fallback to the old `distutils` backend.
However, if you do have `ninja` installed, it is possible that this change will cause your C++ extension build to fail by oversubscribing your system with too many worker processes. There are two potential workarounds to this.
Method 1: If a previously succeeding `python setup.py install` now fails, try setting the `MAX_JOBS` environment variable.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="sh">
python setup.py install
</pre></sub></td>
<td><sub><pre lang="sh">
MAX_JOBS=2 python setup.py install
</pre></sub></td>
</tr>
</table>
</p>
Method 2: Switch back to the old `distutils` backend inside your `setup.py`
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
cmdclass={'clean': clean,
'build_ext': BuildExtension},
</pre></sub></td>
<td><sub><pre lang="python">
cmdclass={'clean': clean,
'build_ext': BuildExtension.with_options(use_ninja=False)},
</pre></sub></td>
</tr>
</table>
</p>
`torch.optim.Adam`, `torch.optim.SGD` changed to not modify gradients in-place ([30257](https://github.com/pytorch/pytorch/pull/30257))
In previous versions of PyTorch, the Adam and SGD optimizers modified gradients (e.g. `param.grad`) in-place via in-place addition of `params.grad += weight_decay * param`. To make this consistent with the behavior of other optimizers and to prevent surprises about the behavior, we’ve changed them to stop modifying gradients in-place.
This should not have an effect on most PyTorch programs unless they relied on this behavior. The easiest way to replicate the old behavior is to create a custom optimizer that implements it.
`torch.masked_select` now always returns a 1D tensor ([29923](https://github.com/pytorch/pytorch/pull/29923))
The behavior of `torch.masked_select` when both "self" and "mask" are 0-dimensional was changed. In previous versions of PyTorch, this would return a 0-dimensional tensor. Now, we return a 1-dimensional tensor to be consistent with other input sizes and our documentation.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> torch.masked_select(torch.tensor(0), torch.tensor(True))
tensor(0)
</pre></sub></td>
<td><sub><pre lang="python">
>>> torch.masked_select(torch.tensor(0), torch.tensor(True))
tensor([0])
</pre></sub></td>
</tr>
</table>
</p>
`torch.index_select` on a 0-d tensor now returns a 0-d tensor. ([30790](https://github.com/pytorch/pytorch/pull/30790))
In previous versions of PyTorch, the output of `torch.index_select` on a 0D input tensor produced a 1D tensor. This was inconsistent with our documentation on it, which stated "The returned tensor has the same number of dimensions as the original tensor (input)." Now, we return a 0D tensor.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> torch.index_select(torch.tensor(5), 0, torch.tensor([0]))
tensor([5])
</pre></sub></td>
<td><sub><pre lang="python">
>>> torch.index_select(torch.tensor(5), 0, torch.tensor([0]))
tensor(5)
</pre></sub></td>
</tr>
</table>
</p>
`nn.MultiLabelMarginLoss:` 'none' reduction on 1D tensor now returns a 0D tensor ([30768](https://github.com/pytorch/pytorch/pull/30768))
In previous versions of PyTorch, the output of `nn.MultiLabelMarginLoss` on 1D and 0D tensors incorrectly produced 1-D tensors. Now, those cases return a 0D tensor to be consistent with the 2-D tensor case.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> nn.MultiLabelMarginLoss(reduction='none')(torch.randn(3), torch.zeros(3, dtype=torch.long))
tensor([0.2959])
</pre></sub></td>
<td><sub><pre lang="python">
>>> nn.MultiLabelMarginLoss(reduction='none')(torch.randn(3), torch.zeros(3, dtype=torch.long))
tensor(0.2959)
</pre></sub></td>
</tr>
</table>
</p>
`nn.MultiMarginLoss:` ‘none' reduction on 1D target now returns a 1D tensor ([30826](https://github.com/pytorch/pytorch/pull/30826))
In previous versions of PyTorch, the output of `nn.MultiMarginLoss` on a 1D `target` tensor produced a 0D output. We changed this to return a 1D `target` tensor to make it consistent with other input sizes which return an output that matches the target shape.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> nn.MultiMarginLoss(reduction='none')(torch.tensor([1.]), torch.tensor([0]))
tensor(0.)
</pre></sub></td>
<td><sub><pre lang="python">
>>> nn.MultiMarginLoss(reduction='none')(torch.tensor([1.]), torch.tensor([0]))
tensor([0.])
</pre></sub></td>
</tr>
</table>
</p>
`Tensor.exponential_(lambda)` no longer supports `lambda < 0` ([32501](https://github.com/pytorch/pytorch/pull/32501))
`lambda`, the rate parameter of the exponential distribution, mathematically should be greater than 0. We’ve disabled support `lambda < 0` to be mathematically correct; most users will not have used a lambda less than zero.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
tensor = torch.empty(3).exponential_(-1.5)
</pre></sub></td>
<td><sub><pre lang="python">
Negative lambda not supported!
</pre></sub></td>
</tr>
</table>
</p>
`nn.BCELoss`, `nn.functional.binary_cross_entropy` no longer accept inputs with the same number of elements that are not broadcastable ([31365](https://github.com/pytorch/pytorch/pull/31365))
Previously, we supported accepting inputs with the same number of elements. However, this behavior was deprecated and we removed it in 1.5.0. In order to replicate the old behavior, please explicitly `reshape` your input and target tensors to have the same shape.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> input = torch.rand(3, 3)
>>> target = torch.randn(9)
>>> torch.nn.functional.binary_cross_entropy(input, target)
</pre></sub></td>
<td><sub><pre lang="python">
>>> input = torch.rand(3, 3)
>>> target = torch.randn(9)
>>> torch.nn.functional.binary_cross_entropy(input, target.reshape_as(input))
</pre></sub></td>
</tr>
</table>
</p>
`torch.normal` out argument is now required to have the same size as the computed output ([32031](https://github.com/pytorch/pytorch/pull/32031))
Previously, on CPU devices, `torch.normal(mean, std, out=out)` would resize `out` to the correct size. To be consistent with the CUDA implementation, we’ve changed it so that `out` must either already have the correct size, or be an empty tensor with size `[0]`. To work around this, please ensure that your `out` tensor has the correct size.
<p align="center">
<table align="center">
<tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr>
<tr valign="top">
<td><sub><pre lang="python">
>>> torch.normal(torch.zeros(3), torch.ones(3), out=torch.randn(2))
```
Links
- PyPI: https://pypi.org/project/torch
- Changelog: https://pyup.io/changelogs/torch/
- Repo: https://github.com/pytorch/pytorch/tags
- Homepage: https://pytorch.org/
This PR updates torch from 1.4.0 to 1.5.0.
Changelog
### 1.5.0 ``` class Exp(torch.autograd.Function): staticmethod def forward(ctx, i): result = i.exp() ctx.save_for_backward(result) return result staticmethod def backward(ctx, grad_output): result, = ctx.saved_tensors return grad_output * result Exp.apply(torch.tensor(1.)) `torch.optim` optimizers changed to fix in-place checks for the changes made by the optimizer ([33640](https://github.com/pytorch/pytorch/pull/33640), [34211](https://github.com/pytorch/pytorch/pull/34211)) If this causes your code to fail, there are two possible reasons: Reason 1: The value of that parameter was actually saved and used and we were computing incorrect gradients in previous versions of PyTorch. This would result in an error message mentioning incorrect version numbers. You should replace code that uses `self.my_param` by `self.my_param.clone()` to make sure the saved version is different from the one that is modified by the optimizer. For example: Before 1.5.0, the following may have worked. def model(input, target, param): return `(input * param ** 2 - target).norm()` param = torch.randn(2, requires_grad=True) input = torch.randn(2) target = torch.randn(2) sgd = optim.SGD([param], lr=0.001) loss = model(input, target, param) loss.backward(retain_graph=True) sgd.step() loss.backward() param.grad If after upgrading to 1.5.0, the above fails due to a version counter error, then that means the gradient computed was incorrect. To remedy this, clone `param` before using it in the model: def model(input, target, param): return (input * param ** 2 - target).norm() param = torch.randn(2, requires_grad=True) input = torch.randn(2) target = torch.randn(2) sgd = optim.SGD([param], lr=0.001) loss = model(input, target, param.clone()) loss.backward(retain_graph=True) sgd.step() loss.backward() param.grad Reason 2: You know what you're doing and change the values back to the right thing before the next backward. However, you're running into an error because the version counter cannot be decremented. Open an issue with your particular use case and we will help you to work around the version counter issue. `utils.cpp_extensions` now use `ninja` as the default compilation backend ([32495](https://github.com/pytorch/pytorch/pull/32495)) `ninja` enables parallel compilation of your C++ extension, greatly speeding up compilation. This change will not break most user code; if you do not have `ninja` installed, we fallback to the old `distutils` backend. However, if you do have `ninja` installed, it is possible that this change will cause your C++ extension build to fail by oversubscribing your system with too many worker processes. There are two potential workarounds to this. Method 1: If a previously succeeding `python setup.py install` now fails, try setting the `MAX_JOBS` environment variable. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="sh"> python setup.py install </pre></sub></td> <td><sub><pre lang="sh"> MAX_JOBS=2 python setup.py install </pre></sub></td> </tr> </table> </p> Method 2: Switch back to the old `distutils` backend inside your `setup.py` <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> cmdclass={'clean': clean, 'build_ext': BuildExtension}, </pre></sub></td> <td><sub><pre lang="python"> cmdclass={'clean': clean, 'build_ext': BuildExtension.with_options(use_ninja=False)}, </pre></sub></td> </tr> </table> </p> `torch.optim.Adam`, `torch.optim.SGD` changed to not modify gradients in-place ([30257](https://github.com/pytorch/pytorch/pull/30257)) In previous versions of PyTorch, the Adam and SGD optimizers modified gradients (e.g. `param.grad`) in-place via in-place addition of `params.grad += weight_decay * param`. To make this consistent with the behavior of other optimizers and to prevent surprises about the behavior, we’ve changed them to stop modifying gradients in-place. This should not have an effect on most PyTorch programs unless they relied on this behavior. The easiest way to replicate the old behavior is to create a custom optimizer that implements it. `torch.masked_select` now always returns a 1D tensor ([29923](https://github.com/pytorch/pytorch/pull/29923)) The behavior of `torch.masked_select` when both "self" and "mask" are 0-dimensional was changed. In previous versions of PyTorch, this would return a 0-dimensional tensor. Now, we return a 1-dimensional tensor to be consistent with other input sizes and our documentation. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> >>> torch.masked_select(torch.tensor(0), torch.tensor(True)) tensor(0) </pre></sub></td> <td><sub><pre lang="python"> >>> torch.masked_select(torch.tensor(0), torch.tensor(True)) tensor([0]) </pre></sub></td> </tr> </table> </p> `torch.index_select` on a 0-d tensor now returns a 0-d tensor. ([30790](https://github.com/pytorch/pytorch/pull/30790)) In previous versions of PyTorch, the output of `torch.index_select` on a 0D input tensor produced a 1D tensor. This was inconsistent with our documentation on it, which stated "The returned tensor has the same number of dimensions as the original tensor (input)." Now, we return a 0D tensor. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> >>> torch.index_select(torch.tensor(5), 0, torch.tensor([0])) tensor([5]) </pre></sub></td> <td><sub><pre lang="python"> >>> torch.index_select(torch.tensor(5), 0, torch.tensor([0])) tensor(5) </pre></sub></td> </tr> </table> </p> `nn.MultiLabelMarginLoss:` 'none' reduction on 1D tensor now returns a 0D tensor ([30768](https://github.com/pytorch/pytorch/pull/30768)) In previous versions of PyTorch, the output of `nn.MultiLabelMarginLoss` on 1D and 0D tensors incorrectly produced 1-D tensors. Now, those cases return a 0D tensor to be consistent with the 2-D tensor case. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> >>> nn.MultiLabelMarginLoss(reduction='none')(torch.randn(3), torch.zeros(3, dtype=torch.long)) tensor([0.2959]) </pre></sub></td> <td><sub><pre lang="python"> >>> nn.MultiLabelMarginLoss(reduction='none')(torch.randn(3), torch.zeros(3, dtype=torch.long)) tensor(0.2959) </pre></sub></td> </tr> </table> </p> `nn.MultiMarginLoss:` ‘none' reduction on 1D target now returns a 1D tensor ([30826](https://github.com/pytorch/pytorch/pull/30826)) In previous versions of PyTorch, the output of `nn.MultiMarginLoss` on a 1D `target` tensor produced a 0D output. We changed this to return a 1D `target` tensor to make it consistent with other input sizes which return an output that matches the target shape. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> >>> nn.MultiMarginLoss(reduction='none')(torch.tensor([1.]), torch.tensor([0])) tensor(0.) </pre></sub></td> <td><sub><pre lang="python"> >>> nn.MultiMarginLoss(reduction='none')(torch.tensor([1.]), torch.tensor([0])) tensor([0.]) </pre></sub></td> </tr> </table> </p> `Tensor.exponential_(lambda)` no longer supports `lambda < 0` ([32501](https://github.com/pytorch/pytorch/pull/32501)) `lambda`, the rate parameter of the exponential distribution, mathematically should be greater than 0. We’ve disabled support `lambda < 0` to be mathematically correct; most users will not have used a lambda less than zero. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> tensor = torch.empty(3).exponential_(-1.5) </pre></sub></td> <td><sub><pre lang="python"> Negative lambda not supported! </pre></sub></td> </tr> </table> </p> `nn.BCELoss`, `nn.functional.binary_cross_entropy` no longer accept inputs with the same number of elements that are not broadcastable ([31365](https://github.com/pytorch/pytorch/pull/31365)) Previously, we supported accepting inputs with the same number of elements. However, this behavior was deprecated and we removed it in 1.5.0. In order to replicate the old behavior, please explicitly `reshape` your input and target tensors to have the same shape. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> >>> input = torch.rand(3, 3) >>> target = torch.randn(9) >>> torch.nn.functional.binary_cross_entropy(input, target) </pre></sub></td> <td><sub><pre lang="python"> >>> input = torch.rand(3, 3) >>> target = torch.randn(9) >>> torch.nn.functional.binary_cross_entropy(input, target.reshape_as(input)) </pre></sub></td> </tr> </table> </p> `torch.normal` out argument is now required to have the same size as the computed output ([32031](https://github.com/pytorch/pytorch/pull/32031)) Previously, on CPU devices, `torch.normal(mean, std, out=out)` would resize `out` to the correct size. To be consistent with the CUDA implementation, we’ve changed it so that `out` must either already have the correct size, or be an empty tensor with size `[0]`. To work around this, please ensure that your `out` tensor has the correct size. <p align="center"> <table align="center"> <tr><th>Version 1.4.0</th><th>Version 1.5.0</th></tr> <tr valign="top"> <td><sub><pre lang="python"> >>> torch.normal(torch.zeros(3), torch.ones(3), out=torch.randn(2)) ```Links
- PyPI: https://pypi.org/project/torch - Changelog: https://pyup.io/changelogs/torch/ - Repo: https://github.com/pytorch/pytorch/tags - Homepage: https://pytorch.org/