Closed rohitgr7 closed 1 year ago
@rohitgr7 Good observation.
Yes, this was indeed a somewhat breaking change and it was not fully intentional. However, one could also argue this was a bug or at least undefined behavior. Previously, if we called train_dataloader(), e.g., through the trainer, it would still have lead to Trainer calling it a second time later on when it got reset, if I recall correctly.
For the purpose of documentation, here is the sequence of hooks we exacute around configure_optimizers
and *_dataloader
calls:
With this issue request, either the configure_optimizers
call would have to move several hooks later or the dataloader
reset call would have to move earlier.
What would make this configuration more favorable than the current one? Isn't it equally valid to request having the optimizers available in the dataloader methods? If both need to be the case, then this is hitting the fundamental limits of the hook system in Lightning. A similar issue was faced in #8485.
All I'm saying is, if the hook order changes then we need to be 100% sure that the order we choose is more favorable than the other in most cases, and we need to be aware of the limitations.
Hi! @rohitgr7 @awaelchli
I have a use case where I need to access the training data loader in configure_optimizers()
, not vise-versa.
I'm working on integrating Lightning with Opacus (it enables training with differential privacy). This pull request demonstrates how PL user can add privacy by just wrapping optimizer in configure_optimizers()
.
The problem is, we need to access the training data to configure noise generation and privacy accounting properly. We can pass some stats about the training data as the model parameters, but this can easily lead to inconsistent optimizer configuration and as the result, incorrect privacy accounting.
My current solution touches _data_connector
, which is a dirty hack:
def configure_optimizers(self):
optimizer = optim.SGD(self.parameters(), lr=self.lr, momentum=0)
if self.enable_dp:
# dirty introspection of the trainer instance to get the training data
data_loader = (
self.trainer._data_connector._train_dataloader_source.dataloader()
)
# transform (model, optimizer, dataloader) to DP-versions
model, optimizer, dataloader = self.privacy_engine.make_private(
self,
optimizer,
data_loader,
...
)
return optimizer
cc @ananthsub (To our discussion about using TrainingTypePlugin: I think this way of enabling DP by modifying configure_optimizers can be useful for advanced users and DP researchers, I would prefer to have both on the table)
With this issue request, either the configure_optimizers call would have to move several hooks later or the dataloader reset call would have to move earlier.
I think moving the configure_optimizers later would be better since moving the dataloader initialization call outside FitLoop is inconvenient plus we only initialize optimizers during .fit
so moving this initialization inside Fit Loop might be better. Although that will require a lot of changes around sharded/deepspeed since they do their initialization and wrap them around in a different way. cc @SeanNaren
Isn't it equally valid to request having the optimizers available in the dataloader methods?
do we have any use-case/example for it? @awaelchli
@rohitgr7 Would you like to try to invert the order in a draft? Failing tests should then reveal potential challenges we have not considered yet, if there are any.
yep! we do have some failing tests.
Any update?
hey @andreimargeloiu !
the current workaround is that you can initialize the dataloader before it's initialized by lightning using:
def configure_optimizers(self):
self.trainer.reset_train_dataloader()
self.train_dataloader.loaders # access it here.
A bit confused here. So in the current pytorch lightning version, if I want to access the train dataloder, for example, in my model's forward method, I do train_loader = self.trainer.datamodule.train_dataloader()
?
Looks like if I do this in model's def setup(self)
, the train dataloader will be loaded because it was not loaded before. But if I do this in model's forward method, will the dataloader be reloaded or it is just using a cached one?
Would be happy to see if there is a more elegant way of doing this.
The above workaround by @rohitgr7 does not sem to work anymore in pytorch-lightning >= 2.0.0
, since the reset_train_dataloader()
function does no longer exist.
I receive the error AttributeError: 'Trainer' object has no attribute 'reset_train_dataloader'
.
Is there an other workaround or any plans to fix this? I think it is quite more common to access the dataset in configure_optimizer
in order to easily get access to data loading parameters like batch size, number of total samples, etc. These information are often necessary for initializing learning rate schedulers.
In 2.0, you can do
def configure_optimizers(self):
self.trainer.fit_loop.setup_data()
dataloader = self.trainer.train_dataloader
...
You could also call self.train_dataloader()
too if your LightningModule
defines it
Closing as I don't think there's anything actionable. Feel free to post further questions though
Proposed refactoring or deprecation
Before v1.5, the dataloader hooks were patched to model and were easily accessible within
configure_optimizers
to setup total training steps for scheduler. But now since they are no longer patched, https://github.com/PyTorchLightning/pytorch-lightning/blob/0ed5e3dc8abcec40aacd64cc9175590bb1409759/pytorch_lightning/trainer/connectors/data_connector.py#L213-L224 they are no longer available directly if using a datamodule or dataloaders are passed directly to.fit
. Neither they can be accessed usingself.trainer.train_dataloaders
because dataloaders are being loaded within Fit Loop. https://github.com/PyTorchLightning/pytorch-lightning/blob/0ed5e3dc8abcec40aacd64cc9175590bb1409759/pytorch_lightning/loops/fit_loop.py#L194-L197Motivation
I'd suggest these dataloaders should be available for users, no matter how to passed it during
.fit
.Pitch
If possible we should load call
configure_optimizers
after loading the dataloaders for the first time within fit loop. Not sure if it will bring some complications and failures because we load the optimizers differently for deepspeed.Additional context
As always, alternative suggestions/thoughts would be appreciated :)
cc: @karthikrangasai @awaelchli
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.