The library is design with composition in mind and is very modular.
The distributed training benchmarks look very promising, so this would be a great addition to the project.
From their maintainers:
Despite the experimental status, we’ve already been taking backward compatibility very seriously as there are already some early adopter from external companies. In terms of API, the surface area is very small and we do not have any plans for BC-breaking changes. In terms of storage format, we are already committed to being backward compatible. FWIW, the project will go to beta stage late September or early October.
In the future, it could also include other features such as snapshotting the DataLoader state (both for V1 and V2 DataLoaders)
At this point, it looks like a SnapshotCheckpointIO plugin would be the right mechanism to do it.
Alternatives
Not do it.
If you enjoy Lightning, check out our other projects! âš¡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
cc @borda @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj @akihironitta
🚀 Feature
Integrate https://github.com/pytorch/torchsnapshot
Motivation
The library is design with composition in mind and is very modular. The distributed training benchmarks look very promising, so this would be a great addition to the project.
From their maintainers:
In the future, it could also include other features such as snapshotting the DataLoader state (both for V1 and V2 DataLoaders)
More resources:
Pitch
At this point, it looks like a
SnapshotCheckpointIO
plugin would be the right mechanism to do it.Alternatives
Not do it.
If you enjoy Lightning, check out our other projects! âš¡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
cc @borda @awaelchli @ananthsub @ninginthecloud @rohitgr7 @otaj @akihironitta