PyTorch Lightning 1.6 is the work of 99 contributors who have worked on features, bug-fixes, and documentation for a total of over 750 commits since 1.5. This is our most active release yet. Here are some highlights:
Introducing Intel's Habana Accelerator
Lightning 1.6 now supports the Habana® framework, which includes Gaudi® AI training processors. Their heterogeneous architecture includes a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries and a configurable Matrix Math engine.
You can leverage the Habana hardware to accelerate your Deep Learning training workloads simply by passing:
The Bagua Strategy is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling Bagua, which can be considerably faster than vanilla PyTorch DDP, is as simple as:
Towards stable Accelerator, Strategy, and Plugin APIs
The Accelerator, Strategy, and Plugin APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.
In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (TrainingTypePlugin) as well as certain Plugins. In particular, we want to highlight the following changes:
All TrainingTypePlugins have been renamed to Strategy (#11120). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new strategy and devices flags to the Trainer.
Broadcast the _terminate_gracefully to all processes and add support for DDP (#10638)
Added support for re-instantiation of custom (subclasses of) DataLoaders returned in the *_dataloader() methods, i.e., automatic replacement of samplers now works with custom types of DataLoader (#10680)
Added a function to validate if fault tolerant training is supported. (#10465)
Added a private callback to manage the creation and deletion of fault-tolerance checkpoints (#11862)
Show a better error message when a custom DataLoader implementation is not well implemented and we need to reconstruct it (#10719)
Show a better error message when frozen dataclass is used as a batch (#10927)
Save the Loop's state by default in the checkpoint (#10784)
Added Loop.replace to easily switch one loop for another (#10324)
Added support for --lr_scheduler=ReduceLROnPlateau to the LightningCLI (#10860)
Added LightningCLI.configure_optimizers to override the configure_optimizers return value (#10860)
Added LightningCLI(auto_registry) flag to register all subclasses of the registerable components automatically (#12108)
Added a warning that shows when max_epochs in the Trainer is not set (#10700)
Added support for returning a single Callback from LightningModule.configure_callbacks without wrapping it into a list (#11060)
Added console_kwargs for RichProgressBar to initialize inner Console (#10875)
Added support for shorthand notation to instantiate loggers with the LightningCLI (#11533)
Added a LOGGER_REGISTRY instance to register custom loggers to the LightningCLI (#11533)
Added info message when the Trainer arguments limit_*_batches, overfit_batches, or val_check_interval are set to 1 or 1.0 (#11950)
Added support for no pre-fetching to DataFetcher (#11606)
Added support for optimizer step progress tracking with manual optimization (#11848)
Return the output of the optimizer.step. This can be useful for LightningLite users, manual optimization users, or users overriding LightningModule.optimizer_step (#11711)
Teardown the active loop and strategy on exception (#11620)
Added a MisconfigurationException if user provided opt_idx in scheduler config doesn't match with actual optimizer index of its respective optimizer (#11247)
Added a loggers property to Trainer which returns a list of loggers provided by the user (#11683)
Added a loggers property to LightningModule which retrieves the loggers property from Trainer (#11683)
Added support for DDP when using a CombinedLoader for the training data (#11648)
Added a warning when using DistributedSampler during validation/testing (#11479)
Added support for Bagua training strategy (#11146)
Added support for manually returning a poptorch.DataLoader in a *_dataloader hook (#12116)
Added rank_zero module to centralize utilities (#11747)
Added a _Stateful support for LightningDataModule (#11637)
Added _Stateful support for PrecisionPlugin (#11638)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.
Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ethanweber/sitcoms3D/network/alerts).
Bumps pytorch-lightning from 1.1.5 to 1.6.0.
Release notes
Sourced from pytorch-lightning's releases.
... (truncated)
Changelog
Sourced from pytorch-lightning's changelog.
... (truncated)
Commits
44e3edb
Cleanup CHANGELOG (#12507)e3893b9
Merge pull request #12509 from RobertLaurella/patch-1041da41
Remove TPU Availability check from parse devices (#12326)4fe0076
Prepare for the 1.6.0 release17215ed
Fix titles capitalization in docsa775804
Update Plugins doc (#12440)71e25f3
Update CI in README.md (#12495)c6cb634
Add usage of Jupyter magic command for loggers (#12333)42169a2
Add typing toLightningModule.trainer
(#12345)2de6a9b
Fix warning message formatting in save_hyperparameters (#12498)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting
@dependabot rebase
.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/ethanweber/sitcoms3D/network/alerts).