Closed carmocca closed 4 years ago
I wonder how the monitor
can be changed in ModelCheckpoint and EarlyStopping callback since both of them have states like best_k_scores
or best_score
or more.
Yeah, checkpoint_on
and early_stop_on
is a bit tricky I guess, I am still trying to understand how they work with ModelCheckpoint
and EarlyStopping
. Not sure if it's completely integrated with these callbacks yet.
I wonder how the
monitor
can be changed in ModelCheckpoint and EarlyStopping callback since both of them have states likebest_k_scores
orbest_score
or more.
How it currently works, these callbacks cannot know when checkpoint_on
and early_stop_on
input metric has changed.
After my proposed changes, we could choose between:
--monitor
used.--monitor
key is detected.keep a dict of the state for each --monitor used. reset the state whenever a new --monitor key is detected.
sounds like a new ModelCheckpoint or EarlyStopping
every time monitor is changed since we have to set mode
accordingly. A dict of ModelCheckpoints maybe where keys can be the monitor., and trigger the checkpoint according to the key marked as checkpoint
or callback
(as @carmocca suggested). Although callback
seems to be better choice here IMO since it will avoid different params for different callbacks in Result obj.
May I also suggest to keep the checkpoint strategies separate from the model? I think what @carmocca is suggesting goes towards that direction. I think we should compute metrics within the model, but we should let the Trainer()
take care of the checkpoint callbacks. By doing so, I could share my model and let you decide how to tweak the training procedure.
@williamFalcon thoughts?
In my opinion, would it be better to use Result
for everything else (logging, reduce_fx, ...) except for checkpoint_on
and early_stop_on
?
And we would use the callbacks as before with the Trainer
with the keys logged from Result
.
For using different monitor
, could we allow list of respective callbacks (multiple checkpoint callbacks, early stopping callbacks, ...) to pass to the Trainer
?
ok guys, tracking this. sorry for the delay. focused on refactors today but will propose something tomorrow am (NY time). A lot of good ideas here that i need to think about overnight :)
Adding a use case here which hopefully fits into the discussion. It was clear how to achieve it in v0.8.4, now it's not anymore:
I want to use the validation accuracy on the whole validation set for checkpointing. Since I have a different number of samples in each batch, I can't just compute the accuracy in each step and average on epoch end. So I compute the number of correct and total samples in each step, log both to eval_result
under the key _valacc and wrote a custom reduce function.
As mentioned in one of the comments above, the monitor
key of the checkpoint callback is now ignored. At the same time, I don't understand how to use checkpoint_on
as an alternative to achieve checkpointing on the aggregated validation accuracy.
Hi all I've just started trying lightning. Been long time Chainer user, where btw these extensions are pretty modular in a sense that everything is in one log, you choose which value to monitor / what trigger to use /which extension to call on trigger. I guess my issue #3338 is a bit of a testimony to things being confusing in that "monitor" not working by itself. I would guess the motivation behind checkpoint_on was to make it automated without explicit callback :-\ Anyways, the bottom line is that I'd like to support the idea that using results monitor for callbacks is way more intuitive :)
thanks for the patience! almost done with the first pass at refactors, so i will likely tackle this over the weekend.
Here's what i'm thinking:
result = EvalResult()
result.log('giraffe', acc)
# then you need to monitor giraffe
es= EarlyStopping(monitor='giraffe')
When you log at both step and epoch, your metrics adds 'step_' or '_epoch' (this is true today).
result.log('giraffe', x, on_step=True, on_epoch=True)
# results in 'step_giraffe', 'epoch_giraffe'
Since without that, the value is ambiguous.... Now, if you want to monitor this metric you can pick which one you want:
es= EarlyStopping(monitor='step_giraffe')
es= EarlyStopping(monitor='epoch_giraffe')
How does this sound to everyone? Does this not cover a use case that's been mentioned?
this won't allow changing the value you want to monitor on, right? I thought early_stop_on
and checkpoint_on
was added to add this functionality.
? yeah, you can still change it... you just have to specify it now in the callback. The problem is that apparently this is causing a ton of confusion.
I personally think it's cleaner to set early_stop_on and checkpoint_on... this proposed new approach seems a bit more annoying to me but it's something i think everyone is asking for?
It still can have early_stop_on
and checkpoint_on
but I think adding another flag callback=True/False
in result.log()
is a good suggestion here to allow other metrics to be added in callback_metrics.
Also, I think EarlyStopping
and ModelCheckpoint
are not completely compatible with the new Result object, right?? For eg, even if I change the early_stop_on
or checkpoint_on
at some point during training, let's say from loss
to accuracy
, these callbacks defined in the Trainer have some state variables like mode
and patience
which I don't think will work in this case. Also even if the user wants to use checkpoint_on
, to update the checkpoint_filename accordingly, one has to log it too with logger=True
.
If you go with this new flag callback
then I suggest changing result.log
to something else, maybe result.add
?
A healthy discussion is required here to cover-up all the possible use-cases.
I personally think it's cleaner to set early_stop_on and checkpoint_on... this proposed new approach seems a bit more annoying to me but it's something i think everyone is asking for?
Mind explaining why do you find the proposed approach more annoying? It uses the already existing parameter monitor
and would avoid having multiple *_on
parameters in the Result (instead adding the callback
parameter to it).
Right now there is only two (checkpoint_on
and early_stop_on
) but with time, there might be more which could clutter lightnings code.
maybe result.add?
what about result.register
?
because not everyone has a "val_loss" keyword... and then it somehow makes val_loss "magical"... but in reality, i want to early stop or checkpoint on some other thing like accuracy and it's annoying to have to init the callback and set the monitor there.
because if i change my mind tomorrow, now i have to remember to change it in multiple places... and it also means that the trainer is no longer model agnostic... instead i have to if statement for every model i want to train to pick a different monitor value.
it is very inconvenient lol.
Without keyword:
# model A
result.log('X', x)
# model B
result.log('Y', y)
if model A:
es = EarlyStopping(monitor='X')
else:
es = EarlyStopping(monitor='Y')
With keyword:
# model A
EvalResult(early_stop_on=X)
# model B
EvalResult(early_stop_on=Y)
What if i want to change the key for early stopping during training?
if epoch % 2 == 0:
EvalResult(checkpoint_on=X)
else:
EvalResult(checkpoint_on=Y)
But without the keyword?? there's no way to do this...
What if you don't know the checkpoint/early_stop metric in the LightningModule? you are forced to do this:
# user_script.py
module = MyLightningModule(monitor=args.monitor)
# my_lightning_module.py
def training_step(...):
...
if self.monitor == "train_loss"
result = pl.TrainResult(minimize=train_loss, early_stop_on=train_loss)
elif self.monitor == "train_acc":
result = pl.TrainResult(minimize=train_loss, early_stop_on=train_acc)
elif:
...
else:
result = pl.TrainResult(minimize=train_loss)
result.log("train_loss", train_loss)
result.log("train_acc", train_acc)
...
def validation_step(...):
....
if self.monitor == "val_loss":
result = pl.EvalResult(early_stop_on=val_loss)
elif self.monitor == "val_acc":
result = pl.EvalResult(early_stop_on=val_acc)
elif:
...
else:
result = pl.EvalResult()
result.log("val_loss", val_loss)
result.log("val_acc", val_acc)
...
The number of if statements would get much worse if I wanted to separate self.monitor
into self.early_stop_monitor
and self.checkpoint_monitor
.
Alternatively:
# user_script.py
module = LightningModule()
es = EarlyStopping(monitor=args.monitor)
# my_lightning_module.py
def training_step(...):
...
result = pl.TrainResult(minimize=train_loss)
result.log("train_loss", train_loss, callback=True)
result.log("train_acc", train_acc, callback=True)
...
def validation_step(...):
...
result = pl.EvalResult()
result.log("val_loss", val_loss, callback=True)
result.log("val_acc", val_acc, callback=True)
...
it's annoying to have to init the callback and set the monitor there
But I believe this is the most common case. You often want to change the monitor without having to modify the module code. It is also how other libraries (e.g. Keras) do it afaik
What if i want to change the key for early stopping during training?
This is why I proposed letting monitor be a Callable, see original issue comment
but this is exactly why the results makes more sense... because you CAN change it while training. This is an exact example of that.
Otherwise you'd have to set the monitor at the callback level and that's now disconnected from the model...
ie: your model can't be moved around and is not modular now because it has this other external dependency where you must also remember to set X value in the callback.
Currently with results you can say: "import my model and use your own trainer"
with your approach you'd say "import my model and use your trainer. And oh by the way, don't forget to init this callback and set this magic word so it'll save weights"
it's not very modular...
This is redundant...
result.log("train_loss", train_loss, callback=True)
we can just make anything that is logged available as a callback (ie: always true) but you don't need to add another keyword
Currently with results you can say: "import my model and use your own trainer"
with your approach you'd say "import my model and use your trainer. And oh by the way, don't forget to init this callback and set this magic word so it'll save weights"
If the issue is about providing callbacks with the module, why not add configure_callbacks
function to LightningModule
?
we can just make anything that is logged available as a callback (ie: always true)
Sure, I don't mind that :smiley:
class MyLightningModule(pl.core.LightningModule):
def __init__(
self,
early_stop_monitor,
checkpoint_monitor,
):
super().__init__()
self.early_stop_monitor = early_stop_monitor
self.checkpoint_monitor = checkpoint_monitor
def configure_callbacks(self):
callbacks = []
if self.early_stop_monitor:
callbacks.append(EarlyStopping(monitor=self.early_stop_monitor))
if self.checkpoint_monitor:
# We have a reference to the trainer here so the following is possible
# if we choose to allow Callables as callback monitors:
monitor = lambda: self.checkpoint_monitor if self.trainer.current_epoch % 2 == 0 else "val_loss",
callbacks.append(ModelCheckpoint(monitor=monitor))
return callbacks
def training_step(self, ...):
...
result = pl.TrainResult(minimize=train_loss)
result.log("train_loss", train_loss)
result.log("train_acc", train_acc)
def validation_step(self, ...):
...
result = pl.EvalResult()
result.log("val_loss", val_loss)
result.log("val_acc", val_acc)
right, i think that's what i was originally hoping for:
but that means we have 2 ways of doing something. so it's why i then thought to just drop the keywords (ie: not do 1 and only 2)
I love this callbacks hook idea... let's bring it up for vote from the community? (nice suggestion @carmocca!)
What do we think?
My take is that some models to philosophically NEED certain callbacks. And to keep models self-contained we need to make sure that the required callbacks are packaged as well.
i’ve seen this myself in self supervised as well… for instance:
but it doesn’t make sense for people to run these models without those callbacks. so now we have this disconnect between stand-alone modules and training
I would have both model.callbacks
and trainer.callbacks
. As you mentioned, some callbacks are inherently related to the model function and others to the training procedure (i.e. checkpointing). The first come packaged with the LightningModule
and the second can be set by whoever is training the model.
💬 Discussion
Since I started using Lightning, I have noticed many users, as well as myself, having trouble with the way metrics are saved for callbacks by the
Result
class.The current design of having only
{early_stop_on,checkpoint_on}
forces different callbacks to use the same monitor. I believe that wanting different monitors forEarlyStopping
,ModelCheckpoint
, andLearningRateLogger
is a very common use case. This is something we will also need to address if we ever want to support multipleModelCheckpoint
callbacks.This is also unintuitive for new users since most callbacks include a
monitor
parameter which becomes completely useless when theResult
object is used. On the other hand, using aResult
object is the new recommended way to structure{training,validation,test}_step()
.Additionally, the approach is not future-proof since we would need to update lightning to include, say,
batchnorm_on
or whatever new technique that becomes widespread.I believe this is an important issue that needs a solution.
Requirements:
We could use this issue to have a productive discussion about how the final design should look and what are the necessary requirements to fit most use cases:
To start, I would say:
monitor
inCallback
s and{early_stop_on,checkpoint_on}
inResult
Own proposal
{early_stop_on,checkpoint_on}
callback
toresult.log()
function (or addresult.callback()
- both would have the same functionality) for users to save metrics on-the-run--monitor
--monitor
type to beOptional[Union[str, Callable[..., str]]
so the following is possible:Can change monitor on-the-run
Note: maybe trainer should be injected into the callback monitor function
pl.callbacks.EarlyStopping( monitor=lambda: "train_loss" if trainer.current_epoch <= 20 else "val_loss", mode="min" )
Could even have
complex_monitor = VeryComplexClassWhichDoesSomethingVeryFancyWithState() assert callable(complex_monitor) pl.callbacks.LearningRateLogger(monitor=complex_monitor)
this would cover a big chunk of usecases