borchero / pycave

Traditional Machine Learning Models for Large-Scale Datasets in PyTorch.
https://pycave.borchero.com
MIT License
126 stars 13 forks source link

Bump pytorch-lightning from 1.5.10 to 1.6.0 #16

Closed dependabot[bot] closed 2 years ago

dependabot[bot] commented 2 years ago

Bumps pytorch-lightning from 1.5.10 to 1.6.0.

Release notes

Sourced from pytorch-lightning's releases.

PyTorch Lightning 1.6: Support Intel's Habana Accelerator, New efficient DDP strategy (Bagua), Manual Fault-tolerance, Stability and Reliability.

The core team is excited to announce the PyTorch Lightning 1.6 release ⚡

Highlights

PyTorch Lightning 1.6 is the work of 99 contributors who have worked on features, bug-fixes, and documentation for a total of over 750 commits since 1.5. This is our most active release yet. Here are some highlights:

Introducing Intel's Habana Accelerator

Lightning 1.6 now supports the Habana® framework, which includes Gaudi® AI training processors. Their heterogeneous architecture includes a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries and a configurable Matrix Math engine.

You can leverage the Habana hardware to accelerate your Deep Learning training workloads simply by passing:

trainer = pl.Trainer(accelerator="hpu")

single Gaudi training

trainer = pl.Trainer(accelerator="hpu", devices=1)

distributed training with 8 Gaudi

trainer = pl.Trainer(accelerator="hpu", devices=8)

The Bagua Strategy

The Bagua Strategy is a deep learning acceleration framework that supports multiple, advanced distributed training algorithms with state-of-the-art system relaxation techniques. Enabling Bagua, which can be considerably faster than vanilla PyTorch DDP, is as simple as:

trainer = pl.Trainer(strategy="bagua")

or to choose a custom algorithm

trainer = pl.Trainer(strategy=BaguaStrategy(algorithm="gradient_allreduce") # default

Towards stable Accelerator, Strategy, and Plugin APIs

The Accelerator, Strategy, and Plugin APIs are a core part of PyTorch Lightning. They're where all the distributed boilerplate lives, and we're constantly working to improve both them and the overall PyTorch Lightning platform experience.

In this release, we've made some large changes to achieve that goal. Not to worry, though! The only users affected by these changes are those who use custom implementations of Accelerator and Strategy (TrainingTypePlugin) as well as certain Plugins. In particular, we want to highlight the following changes:

  • All TrainingTypePlugins have been renamed to Strategy (#11120). Strategy is a more appropriate name because it encompasses more than simply training communcation. This change is now aligned with the changes we implemented in 1.5, which introduced the new strategy and devices flags to the Trainer.

... (truncated)

Changelog

Sourced from pytorch-lightning's changelog.

[1.6.0] - 2022-03-29

Added

  • Allow logging to an existing run ID in MLflow with MLFlowLogger (#12290)
  • Enable gradient accumulation using Horovod's backward_passes_per_step (#11911)
  • Add new DETAIL log level to provide useful logs for improving monitoring and debugging of batch jobs (#11008)
  • Added a flag SLURMEnvironment(auto_requeue=True|False) to control whether Lightning handles the requeuing (#10601)
  • Fault Tolerant Manual
    • Add _Stateful protocol to detect if classes are stateful (#10646)
    • Add _FaultTolerantMode enum used to track different supported fault tolerant modes (#10645)
    • Add a _rotate_worker_indices utility to reload the state according the latest worker (#10647)
    • Add stateful workers (#10674)
    • Add an utility to collect the states across processes (#10639)
    • Add logic to reload the states across data loading components (#10699)
    • Cleanup some fault tolerant utilities (#10703)
    • Enable Fault Tolerant Manual Training (#10707)
    • Broadcast the _terminate_gracefully to all processes and add support for DDP (#10638)
  • Added support for re-instantiation of custom (subclasses of) DataLoaders returned in the *_dataloader() methods, i.e., automatic replacement of samplers now works with custom types of DataLoader (#10680)
  • Added a function to validate if fault tolerant training is supported. (#10465)
  • Added a private callback to manage the creation and deletion of fault-tolerance checkpoints (#11862)
  • Show a better error message when a custom DataLoader implementation is not well implemented and we need to reconstruct it (#10719)
  • Show a better error message when frozen dataclass is used as a batch (#10927)
  • Save the Loop's state by default in the checkpoint (#10784)
  • Added Loop.replace to easily switch one loop for another (#10324)
  • Added support for --lr_scheduler=ReduceLROnPlateau to the LightningCLI (#10860)
  • Added LightningCLI.configure_optimizers to override the configure_optimizers return value (#10860)
  • Added LightningCLI(auto_registry) flag to register all subclasses of the registerable components automatically (#12108)
  • Added a warning that shows when max_epochs in the Trainer is not set (#10700)
  • Added support for returning a single Callback from LightningModule.configure_callbacks without wrapping it into a list (#11060)
  • Added console_kwargs for RichProgressBar to initialize inner Console (#10875)
  • Added support for shorthand notation to instantiate loggers with the LightningCLI (#11533)
  • Added a LOGGER_REGISTRY instance to register custom loggers to the LightningCLI (#11533)
  • Added info message when the Trainer arguments limit_*_batches, overfit_batches, or val_check_interval are set to 1 or 1.0 (#11950)
  • Added a PrecisionPlugin.teardown method (#10990)
  • Added LightningModule.lr_scheduler_step (#10249)
  • Added support for no pre-fetching to DataFetcher (#11606)
  • Added support for optimizer step progress tracking with manual optimization (#11848)
  • Return the output of the optimizer.step. This can be useful for LightningLite users, manual optimization users, or users overriding LightningModule.optimizer_step (#11711)
  • Teardown the active loop and strategy on exception (#11620)
  • Added a MisconfigurationException if user provided opt_idx in scheduler config doesn't match with actual optimizer index of its respective optimizer (#11247)
  • Added a loggers property to Trainer which returns a list of loggers provided by the user (#11683)
  • Added a loggers property to LightningModule which retrieves the loggers property from Trainer (#11683)
  • Added support for DDP when using a CombinedLoader for the training data (#11648)
  • Added a warning when using DistributedSampler during validation/testing (#11479)
  • Added support for Bagua training strategy (#11146)
  • Added support for manually returning a poptorch.DataLoader in a *_dataloader hook (#12116)
  • Added rank_zero module to centralize utilities (#11747)
  • Added a _Stateful support for LightningDataModule (#11637)
  • Added _Stateful support for PrecisionPlugin (#11638)

... (truncated)

Commits


Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) - `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language - `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language - `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language - `@dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/borchero/pycave/network/alerts).
dependabot[bot] commented 2 years ago

OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting @dependabot ignore this major version or @dependabot ignore this minor version.

If you change your mind, just re-open this PR and I'll resolve any conflicts on it.