LoicGrobol / zeldarose

Train transformer-based models.
https://zeldarose.readthedocs.io
Other
28 stars 3 forks source link

Update pytorch-lightning requirement from <1.5.0,>=1.4.7 to >=1.4.7,<1.6.0 #11

Closed dependabot[bot] closed 2 years ago

dependabot[bot] commented 3 years ago

Updates the requirements on pytorch-lightning to permit the latest version.

Release notes

Sourced from pytorch-lightning's releases.

Lightning 1.5: LightningLite, Fault-Tolerant Training, Loop Customization, Lightning Tutorials, LightningCLI v2, RichProgressBar, CheckpointIO Plugin, and Trainer Strategy Flag

The PyTorch Lightning team and its community are excited to announce Lightning 1.5, introducing support for LightningLite, Fault-tolerant Training, Loop Customization, Lightning Tutorials, LightningCLI V2, RichProgressBar, CheckpointIO Plugin, Trainer Strategy flag, and more!

Highlights

Lightning 1.5 marks our biggest release yet. Over 60 contributors have worked on features, bugfixes and documentation improvements for a total of 640 commits since v1.4. Here are some highlights:

Fault-tolerant Training

Fault-tolerant Training is a new internal mechanism that enables PyTorch Lightning to recover from a hardware or software failure. This is particularly interesting while training in the cloud with preemptive instances which can shutdown at any time. Once a Lightning experiment unexpectedly exits, a temporary checkpoint is saved that contains the exact state of all loops and the model. With this new experimental feature, you will be able to restore your training mid-epoch on the exact batch and continue training as if it never got interrupted.

PL_FAULT_TOLERANT_TRAINING=1 python train.py

LightningLite

LightningLite enables pure PyTorch users to scale their existing code to any kind of hardware while retaining full control over their own loops and optimization logic.

With just a few lines of code and no large refactoring, you get support for multi-device, multi-node, running on different accelerators (CPU, GPU, TPU), native automatic mixed precision (half and bfloat16), and double precision, in just a few seconds. And no special launcher required! Check out our documentation to find out how you can get one step closer to boilerplate-free research!

class Lite(LightningLite):
    def run(self):
        # Let Lite setup your dataloader(s)
        train_loader = self.setup_dataloaders(torch.utils.data.DataLoader(...))
    model = Net()  # .to() not needed
    optimizer = optim.Adam(model.parameters())
    # Let Lite setup your model and optimizer
    model, optimizer = self.setup(model, optimizer)

    for epoch in range(5):
        for data, target in train_loader:
            optimizer.zero_grad()
            output = model(data)  # data is already on the device
            loss = F.nll_loss(output, target)
            self.backward(loss)  # instead of loss.backward()
            optimizer.step()

Lite(accelerator="gpu", devices="auto").run()

Loop Customization

The new Loop API lets advanced users swap out the default gradient descent optimization loop at the core of Lightning with a different optimization paradigm. This is part of our effort to make Lightning the simplest, most flexible framework to take any kind of deep learning research to production.

Read our comprehensive introduction to loops

... (truncated)

Changelog

Sourced from pytorch-lightning's changelog.

[1.5.0] - 2021-11-02

Added

  • Added support for monitoring the learning rate without schedulers in LearningRateMonitor (#9786)
  • Added registration of ShardedTensor state dict hooks in LightningModule.__init__ if the PyTorch version supports ShardedTensor (#8944)
  • Added error handling including calling of on_keyboard_interrupt() and on_exception() for all entrypoints (fit, validate, test, predict) (#8819)
  • Added a flavor of training_step that takes dataloader_iter as an argument (#8807)
  • Added a state_key property to the Callback base class (#6886)
  • Added progress tracking to loops:
    • Integrated TrainingEpochLoop.total_batch_idx (#8598)
    • Added BatchProgress and integrated TrainingEpochLoop.is_last_batch (#9657)
    • Avoid optional Tracker attributes (#9320)
    • Reset current progress counters when restarting an epoch loop that had already finished (#9371)
    • Call reset_on_restart in the loop's reset hook instead of when loading a checkpoint (#9561)
    • Use completed over processed in reset_on_restart (#9656)
    • Renamed reset_on_epoch to reset_on_run (#9658)
  • Added batch_size and rank_zero_only arguments for log_dict to match log (#8628)
  • Added a check for unique GPU ids (#8666)
  • Added ResultCollection state_dict to the Loop state_dict and added support for distributed reload (#8641)
  • Added DeepSpeed collate checkpoint utility function (#8701)
  • Added a handles_accumulate_grad_batches property to the training type plugins (#8856)
  • Added a warning to WandbLogger when reusing a wandb run (#8714)
  • Added log_graph argument for watch method of WandbLogger (#8662)
  • LightningCLI additions:
    • Added LightningCLI(run=False|True) to choose whether to run a Trainer subcommand (#8751)
    • Added support to call any trainer function from the LightningCLI via subcommands (#7508)
    • Allow easy trainer re-instantiation (#7508)
    • Automatically register all optimizers and learning rate schedulers (#9565)
    • Allow registering custom optimizers and learning rate schedulers without subclassing the CLI (#9565)
    • Support shorthand notation to instantiate optimizers and learning rate schedulers (#9565)
    • Support passing lists of callbacks via command line (#8815)
    • Support shorthand notation to instantiate models (#9588)
    • Support shorthand notation to instantiate datamodules (#10011)
    • Added multifile option to LightningCLI to enable/disable config saving to preserve multiple files structure (#9073)
  • Fault-tolerant training:
    • Added FastForwardSampler and CaptureIterableDataset injection to data loading utilities (#8366)
    • Added DataFetcher to control fetching flow (#8890)
    • Added SharedCycleIteratorState to prevent infinite loop (#8889)
    • Added CaptureMapDataset for state management in map-style datasets (#8891)
    • Added Fault Tolerant Training to DataFetcher (#8891)
    • Replaced old prefetch iterator with new DataFetcher in training loop (#8953)
    • Added partial support for global random state fault-tolerance in map-style datasets (#8950)
    • Converted state to tuple explicitly when setting Python random state (#9401)
    • Added support for restarting an optimizer loop (multiple optimizers) (#9537)
    • Added support for restarting within Evaluation Loop (#9563)
    • Added mechanism to detect that a signal has been sent so the Trainer can gracefully exit (#9566)
    • Added support for skipping ahead to validation during the auto-restart of fitting (#9681)
    • Added support for auto-restart if a fault-tolerant checkpoint is available (#9722)
  • Checkpoint saving and loading extensibility:

... (truncated)

Commits


Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
dependabot[bot] commented 2 years ago

A newer version of pytorch-lightning exists, but since this PR has been edited by someone other than Dependabot I haven't updated it. You'll get a PR for the updated version as normal once this PR is merged.