Open zcczhang opened 2 years ago
one solution I suggested on slack is to create checkpoints and run another script that monitors these checkpoints and run validation separately.
I think you can do it with Flow/Work? cc: @lantiga
This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions - the Lightning Team!
🚀 Feature
I am wondering if it is possible to include the asynchronous evaluation during the training process
Motivation
For RL projects (or imitation training + online rollout evaluation), evaluation is really a bottleneck during the training process even the environments are vectorized, especially if we want to evaluate lots of long-horizon episodes. Now the training only can continue after eval is done, but it seems not necessary as the evaluation could be done with weights at that timestep, and does not matter the future training.
Pitch
Option for doing asynchronous evaluation during training
Alternatives
using separate scripts to do this
Additional context
No
If you enjoy Lightning, check out our other projects! âš¡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.
cc @borda @awaelchli @rohitgr7