Closed Borda closed 4 years ago
@ethanwharris lest split the discussion and talk about callbacks here...
I don't plan to incorporate checkpoint_callback
and early_stop_callback
callbacks in the new system to avoid making this #889 too big but that should be done in a future PR.
Same for the loggers logic, they should use the callback system IMO (all the custom callbacks I use are loggers...).
Same for the loggers logic, they should use the callback system IMO (all the custom callbacks I use are loggers...).
interesting idea, and true that logger is a special case of a callback, just not sure if I won't be too confusing for a user...
For the user, it's easy to make this change invisible. Just make the Logger
class inherit from Callback
and modify the appropriate loggers to use the new mechanism instead of the custom logger's functions. Everything should be backward compatible.
sure making the parents of loggers does not disturb any user, I was more-less thinking about passing callbacks and lobber in the same argument (list) instead of having an argument for callbacks and loggers separately
But loggers and callbacks are the same things so I don't see any issue and new users should no be disturbed IMO.
About existing users and backward compatibilities, if the documentation is clear about it, everything should be fine.
I'm not convinced that there is an advantage to making Loggers into Callbacks. Currently there is a lot of inter-dependency between checkpointing and logging (the checkpointer directory depends on logger name / version for example), I can see that getting quite messy if everything is of the same class and invoked with the same command. There's also the problem of callback hell, where everything can become a callback if the API is flexible enough, which we should probably avoid. That said, I'm not against the idea if it doesn't disturb the user, just not sure what value it would bring in that case.
@PyTorchLightning/core-contributors what do you think? is it reasonable/beneficial? :robot:
my two cents:
sure making the parents of loggers does not disturb any user, I was more-less thinking about passing callbacks and lobber in the same argument (list) instead of having an argument for callbacks and loggers separately
I think it will be more user friendly to have the trainer keep these separate (eg. Trainer(..., logger=True, callbacks=[])
)
There's also the problem of callback hell, where everything can become a callback if the API is flexible enough, which we should probably avoid.
+1
as I mentioned in #595, I think we should allow user defined callbacks passed in as a list and execute the callbacks in the order which they're provided in the list.
Trainer(..., callbacks=[custom_callback_1, custom_callback_2, model_checkpoint_callback])
one thing that we'll want to consider is what information we'll expose to the callbacks. do we want to expose all model attributes?
@Borda i'm not sure i love this idea. this means you wouldn't get early stopping for free or checkpointing for free. most of the time users don't care about the details of those and a simple True is enough. This means now you'd have to specify all the details.
i think this clutters the API and makes it unnecessarily complex. We moved the callbacks to default arguments after a ton of feedback on people being annoyed they had to init them EVERY time.
and frankly, the keras callbacks are super unintuitive and confusing.
So, unless there's a clear API win, i'm going to push back on this.
I think keeping a True
argument for things like a logger, early stopping, checkpointing, etc is important as it allows a to have a good default behavior without having to define manually everything. Adding a list of callbacks is just an extra argument that gives more control and flexibility.
Both things are not exclusive.
but then we have multiple ways of doing something which is confusing. ie: do i pass in the callback to early stopping or the callbacks list?
having many choices for something is bad and against our principles. We want to force user to make a single decision rooted in best practices and sensible defaults.
I was thinking not breaking the APPI as it is not so there will be still an option to set a checkpoint true, but inside the trainer, it would create a callback instance which will be added to list with others passed via callback argument...
@Borda I was thinking something similar, either appending the early stopping and model check pointing callbacks at the beginning or end of the list.
callbacks = [*user_provided, model_checkpoint, early_stopping]
I agree that losing early stopping and model check pointing āfor freeā would be an undesirable regression.
Moreover, as @williamFalcon pointed out there are still some confusing cases that need to be ironed out before considering moving forward with this.
As an example, right now you can construct your own ModelCheckpoint object and pass it in to the checkpoint_callback
arg. If we also have a callbacks
arg in the Trainer, how will the user know where to provide their ModelCheckpoint object? Should the checkpoint_callback
arg only accept True/False for setting default modes but user provided objects go into the callbacks
arg?
To quote the Zen of Python: āthere should be one ā and preferably only one ā obvious way to do it.ā
What if we started by simplifying how the internal callbacks are executed, make it easier for contributors to add callbacks to the official library, and revisit user-define callbacks at a later date?
Right now, the callbacks must be manually invoked across the Trainer code. Here's an proposal:
checkpoint_callback
, early_stop_callback
, accumulate_grad_batches
)callbacks
attribute based on the above argument values (eg. callbacks = [early_stop_callback, checkpoint_callback, ...]
CallbackHandler
object which invokes these callbacks, in the order provided, at each model hook.This way, if a new user wants to contribute a Callback to the library, they simply have to write the Callback class and determine where in the list of callbacks it should be placed (order of execution).
What if we started by simplifying how the internal callbacks are executed, make it easier for contributors to add callbacks to the official library, and revisit user-defined callbacks at a later date?
I like that, although it would be incredibly useful to go all the way and expose the default callback stack in a way that could be overridden (ie defined in LightningModule).
Anecdotally, I noticed some users are already hacking their own callbacks in by editing Trainer code. Ideally this solution could get us to a point where the number of users overriding Trainer is at a minimum.
@jeffling could u give an example of such a hack?
I think there are some issues with the idea of a CallbackHandler
that we should try to avoid.
on_forward
, on_backward
etc. It's surely easier for the user to extend LoggerBase
, or CheckpointBase... and just implement
log_metricsor
save_checkpoint` and so onI would suggest that there should be well defined and easy to implement abstractions for standard operations (logging, checkpointing, ...), which are called separately in the codebase via their own Collection
classes. If we need more flexibility than that, then we can have callbacks that are more general, although I'm not sure what the use case is.
My concern is that trying to be too flexible is what led to an unusable callback or event handler system in: torchbearer, ignite, catalyst, keras or a host of other libraries. We should be aware that all of those libraries got it wrong, and try to avoid making the same mistakes.
Such use cases definitely exist, mine for example is when one wants to add a test, which checks something at runtime. Here I had to hack a module instead of passing a callback. Also, if you check into this libraries you'll find some examples, that you cannot use in lightning now as a callback. I believe many of users have their own cases.
My proposal is to add a new class, named for example CallbackTrainer
, which does not have logger
, early_stop_callback
etc. arguments, but instead has a callbacks
argument. It should give user the desired flexibility in exchange for free features.
Yeah, I can see that there are good use cases for custom callbacks, I just don't think we should bundle them all into one class along with loggers and checkpoints and the rest. My proposal would be that we have a LoggerCollection
type thing (see #903), an equivalent for checkpoints etc. and then an additional object for custom callbacks that don't fit one of the other archetypes. Loggers, Checkpointers, Callbacks etc. can then be given to the Trainer
as lists or on their own or as booleans. Internally, we should either have a CallbackHandler
as suggested above (but with clear seperate arguments for loggers, checkpointers, callbacks etc. that are then called seperately in the code), or just invoke each of the collections when needed in the Trainer
. The idea of a CallbackTrainer
might be a nice thing to do, but shouldn't be strictly needed to get the right outcome.
Personally, I like your idea, it also should be easier to implement and maintain than new Trainer
class. My proposal was in response to
but then we have multiple ways of doing something which is confusing. ie: do i pass in the callback to early stopping or the callbacks list?
which does no posess such ambiguity.
The limitation of the current system is that it does not give control on what should be logged and when as in a callback system where a hook is called during the different steps of the loop such as on_train_begin
, on_test_end
, etc.
Here the only method is log_metrics
and we do not have control over when this is called.
For example, I have currently a callback system embedded in my module (which is bad but hopefully temporary...) with the following loggers:
on_fit_begin
and then at each on_epoch_end
x
epoch interval on_epoch_end
on_epoch_end
and on_test_end
but in different files (because test metrics only have one epoch).on_fit_begin
, on_fit_end
and on_epoch_end
.I insist on the fact that defaults should be kept as it is now because they are great and make the training loop setup easy and quick but a certain level of flexibility should be given in order to:
on_fit_begin
, on_epoch_end
, etc.Here is a humble proposal https://github.com/PyTorchLightning/pytorch-lightning/pull/889
As about the callback hell, I don't really understand what it means if you could give a specific example? IMO, it's just a matter of designing a good API, then you obviously don't have control over how the user uses it...
About the confusion between checkpoint_callback
and a checkpoint_callback
in a callbacks
list, we could or raises an error or print a warning if such cases arise.
If all of the callbacks can be handled in the same object - they must all use the same method names. This leads to things like
on_forward
,on_backward
etc. It's surely easier for the user to extendLoggerBase
, orCheckpointBase
... and just implementlog_metrics
orsave_checkpoint
and so on
I think we may be talking about different things here. If we want to expose customization for logging and checkpointing, then I agree it's absolutely simpler to have users just implement log_metrics
or save_checkpoint
. However, I don't anticipate customization for logging and checkpointing to be a common use case for custom callbacks.
@hadim does provide a good example, though, with the desire to have control over multiple loggers and their respective behavior. It's a good example in the sense that he's not calling each logger uniformly (ie. do the same action at the same time across all the loggers) which doesn't look like it'd be supported by a LoggerCollection
. It seems like you could achieve this by logging through the model hooks (correct me if I'm wrong, please!) provided that we had a way to access multiple loggers.
class ExampleModel(pl.LightningModule):
def __init__(self, ...):
pass
def on_train_start(self):
self.logger.log_hyperparams(self.hparams) # by default, use root logger
self.logger['FileLogger'].save(x) # get logger by name
def on_epoch_end(self):
self.logger['MatplotlibLogger'].save(x)
self.logger['CSVLogger'].log_metrics(metrics)
I would imagine that the desire for a Callback would be a situation where we want to have more distinct separation of code logic. Is this what other people would expect as well?
My concern is that trying to be too flexible is what led to an unusable callback or event handler system in: torchbearer, ignite, catalyst, keras or a host of other libraries. We should be aware that all of those libraries got it wrong, and try to avoid making the same mistakes.
Could you elaborate on what those libraries got wrong? Where were the pain points? I've used callbacks in Keras (sparingly) and found it to be useful. I think the most common way to enter callback hell is to have a rigid framework which leads to callbacks being the only place for flexibility. Fortunately, PyTorch Lightning is quite flexible so I don't worry about this case so much.
In my opinion, we should collect more information on the following two points:
Haha, yeah I could have been more clear there. Ok, so I'll try to summarise my thoughts on this and then bow out. At least for now, the callback system in #889 can be merged as it's definitely an improvment on what is there currently. I suppose my point is that there are several design patterns we could opt for here that aren't the same as Keras or #889, so some discussion on which is best might benefit us in the long run.
From what I rememer of Keras, you have callbacks with defined hooks (e.g. on_start
etc.) which also take arguments (e.g. batch
, metrics
etc.). So you might have the following:
class MyCallback(Callback):
def on_end(self, model, metrics):
model.do_thing(metrics)
The pattern used in #889 and FastAI-V2 and that we adopted in torchbearer is based on the above but with improved flexibility. To acheive this you swap arguments for access to some global object which has everything. It looks something like this:
class MyCallback(Callback):
def on_end(self):
self.mega_object.model.do_thing(self.mega_object.metrics)
mega_object
then there is really nothing you can't achieve.mega_object
. This was a problem in torchbearer - I can acheive anything I want to - but no one else can without massive investment in learning the library. I can see this being a problem with lightning, every added Mixin brings a number of new properties, how many before these become impossible to keep track of? This wouldn't be such a problem if only advanced users were using custom callbacks, but in my experience people usually opt for the most flexible option (e.g. implementing a callback), so I expect it won't be long until every lightning project has one or two custom callbacks.Ignite uses event handlers. I don't remember all of the details but you could do something like this:
class MyCallback(Callback):
def __call__(self):
self.mega_object.model.do_thing(self.mega_object.metrics)
Trainer.add_callback(MyCallback(), Events.END)
It's also possible to suggest some things that haven't been done before. Personally I like the results you get from employing a fluent interface. It's a bit like a builder pattern without the .build()
and with an emphasis on human readable code. It would look something like this:
class MyCallback(Callback):
def __call__(self):
self.mega_object.model.do_thing(self.mega_object.metrics)
Trainer(callbacks=[MyCallback().call_on_train_end()]
Logger
) which can unpack the right stuff from mega_object
and pass it on.Apologies for the long post and apologies for being difficult. My concern is just that we should be very deliberate in the API decisions that we make. There are downsides to every option, but we should try to avoid being saddled with them by accident. There's also potential to do something new here, that breaks the norm established by keras. Ignite has definite issues, but it's a refreshing take on the problem, which in itself has merit. I guess that we continue down the current path (i.e. merging #889), at least until we have callabcks in lightning, and then think about ways that they can be improved long term.
i like this discussion! i think weāre on to something here. Iād rather debate this some more until we get to a new system that doesnāt have those limitations and strictly improves flexibility without sacrificing readability (in fact we should strive for improving both).
I suggest we rephrase this issue from āletās add callbacksā to āitās hard to do xyz in lightning. How can we solve for that?ā The answer may not be callbacks :). This way we focus on solving the problem instead of finding a use case for an old solution.
These kind of major decisions take time to think about (API design is not easy lol), and we shouldnāt rush this decision.
I think the way to move forward is
describe 5 use cases that are complex or impossible in lightning.
agree on those.
brainstorm solutions not limited by what others have done. ios, web and android paradigms can give good inspiration.
I think callbacks in these libraries are bugs and not features. Added to make up for poorly thought out APIs. Itās the catch all to not wanting to think hard about this (like the utils folder in most projects).
I really like @ethanwharris point that just because those libraries did it doesnāt make it correct. I also think they got it wrong. Going back to a callback system defeats the point of lightning and we regress into having a bunch of unstructured files that break readability and reproducibility.
I donāt want to get in the habit of copying other libraries because we assume they did it right... I posit that libraries fall out of flavor because they do the same thing as other libraries. we need to think differently and make our own independent decisions based on what we think the future of frameworks looks like.
Iām still not convinced about callbacks because i donāt see a huge pain point that they solve. please illustrate a few so I can change my mind :)
Why wouldnāt i define some random function i want to execute in the lightning module and call it during one of the hooks?
Isnāt this equivalent?
LightningModule:
def my_function_to_do_something_crazy():
do crazy stuff
def on_epoch_end:
self. my_function_to_do_something_crazy()
vs
Callback:
on_epoch_end():
do crazy stuff
However we solve this, keep in mind readability for reproducibility purposes! Weāve made huge headway in making DL code easier to understand... a callback system as proposed in the other libraries now means i have to dig througj files and unrelated functions to figure out whatās going on. You can imagine a gan defined over a sequence of callbacks which would make it extremely difficult to see whatās happening.
Remember that part of our role is to force a style guide for non engineers as well that is as flexible as possible for professional researchers.
@tullie @srush @thomwolf @luiscape any thoughts?
I don't want to have a callback system to do the same way as Keras or another ML library did. I think it's needed because its flexibility-by-design is what makes it able to solve potentially all or at least almost-all use-cases.
For example in #889, as @ethanwharris already said I implement a callback system where no arguments are passed to the callback functions. It's much less confusing to have a global trainer
object than a list of obscure's arguments passed to the callback's methods.
I think it's important to describe use-cases (such as what I did already) but what if we miss an important one? or what if a use-case that looks improbable today become standard tomorrow?
PL (Pytorch-Lightning) is already a callback system by default and I think that is why it's so great. Instead of having a single fit()
function on which you can only pass a list of callbacks to modify its behavior here you can have two very well separated callback systems:
LightningModule
where you define your training logic, the science that should be reproducible and serializable.LightningModule
.You are correct all of this can be done in the current LightningModule
. This is what I am already doing and it works but it makes the code less readable and doesn't really separate the science from the boring things. Moreover, I would like to easily reuse callbacks in-between different LightningModule
.
About the comparison with other libraries, see Keras for example. The main reason why people where using Keras's callbacks to modify the training loop and do crazy stuff was because of the fit()
function. This can't happen with PL and the flexible LightningModule
training loop anymore.
Or actually it can, people can use LightningModule
in a way that is not intended for, they could also fork and hack the Trainer
class or they could also use a callback system to modify the training loop itself. But this is not something you can really prevent. What you can do and I think what you are trying to do right now is to prevent such behaviors by default. But it's already done IMO, because of the flexibility of the LightningModule
training loop. People won't have to create crazy callbacks anymore.
Please don't refuse a callback system because people haven't used it correctly in the past. Or to be more specific because the library they were using were kind-of forcing them to hack the training loop using the callback system.
You won't prevent people using PL the bad way, the best you can do is to encourage best-practices and with PL you're already doing it by providing a callback-style training loop and have good default options.
API design is thought indeed. But I think it's good to have an in-depth discussion about it as we do here.
Hey guys, I'm not (yet) a member of the lightning team, but I am a contributor of ignite and answering here on invitation of @williamFalcon .
I strongly agree with @hadim and no matter which system you opt for, you should definitely choose one IMO. This way you can increase your flexibility for the future a lot. Personally, I disagree with the argument, that this decreases readability. Some parts that currently have to be done in the LightningModule don't actually belong there, if you really think, that only the research stuff should be in there. But since currently there is not callback system for the trainer, users don't have another choice.
And if you want to include a callback system, I recommend one that somehow includes what @ethanwharris called mega_object
, since without this, the flexibility will be limited and not as attractive for some users as it would be with that object. Along with proper tutorials and examples the learning curve should be okay for the users actually needing the callback.
Of course I really like the ignite like system and would also recommend that one, but I might be a bit biased, since I'm also contributing there.
I like @williamFalcon 's proposal to focus the discussion on the problems at hand. To bring back @hadim 's example, it seems like that source of dissatisfaction is (1) lack of separation of concerns and (2) inability for reuse across models. One thing that is interesting to note is that the desired functionality is possible today through model hooks - many people have mentioned Lightning is flexible in this manner.
Perhaps the problem could be resolved through more clear documentation and recommendation on design practices? For example, Lightning already leverages Mixins to compose a series of separate concerns. Can we suggest the same thing here?
Define your reusable logic as part of a model hook mix-in.
class CustomHooksMixin(pl.ModelHooks):
def on_train_start(self):
self.logger.log_hyperparams(self.hparams) # by default, use root logger
self.logger['FileLogger'].save(x) # get logger by name
def on_epoch_end(self):
self.logger['MatplotlibLogger'].save(x)
self.logger['CSVLogger'].log_metrics(metrics)
And keep the LightningModule focused on model and training definition.
class ExampleModel(CustomHooksMixin, pl.LightningModule):
def __init__(self, hparams):
self.hparams = hparams
def forward(self, data):
pass
def training_step(self, batch, batch_idx):
pass
def validation_step(self, batch, batch_idx):
pass
def validation_end(self, outputs):
pass
def configure_optimizers(self):
pass
Thanks for the proposal that looks interesting but raises multiple issues:
Using mixins in Python is sometimes mandatory as with pl.Trainer
given the code complexity but I'd like to avoid to use them when it's not needed. Because you have to use variables and methods that are not instantiated in a given mixin and can create some confusion and could potentially lead to unwanted behavior without rigorous testing as it is done with pl.Trainer
. I am not sure Python is the best language for the mixin pattern but as I said in certain situations such as pl.Trainer
it's probably the best option.
Since the mixins are part of your pl.LightningModule
, pl.Trainer
will consider all things happening in your mixins being part of your training loop. For example, in ddp
mode you might not want your callbacks to be pickled and sent to multiple nodes but would rather keep that in the main process of the trainer.
What about callback's configuration? Using the mixin pattern you would have to pass the argument of your callbacks and of your pl.LightningModule
at the same time. I don't think this pattern is desirable. Same for reproducibility, since IMO callbacks should not be needed for reproducibility and so needed when re-instantiating a pl.LightningModule
.
Overall I am not sure the mixin pattern can replace a callback system.
Mixin pattern in pl.LightningModule
looks like a nice pattern when you have a complex training loop or/and you want to reuse blocks of logic in-between models. In that situation, it's probably a good solution.
Because you have to use variables and methods that are not instantiated in a given mixin and can create some confusion and could potentially lead to unwanted behavior
I agree this is awkward.
Overall I am not sure the mixin pattern can replace a callback system.
I should clarify, I'm not proposing that it should š My goal was to explore an alternative design pattern (outside of callbacks) that addresses the use case that you presented for custom reporting (ie. "reuse blocks of logic in-between models").
Same for reproducibility, since IMO callbacks should not be needed for reproducibility and so needed when re-instantiating a pl.LightningModule.
This could be dangerous if a callback does alter the training procedure as making it optional would hinder reproducibility.
i like the brainstorm for different approaches. Can we take a second to list out 3 different use cases we want to solve?
then any brainstorm solution can be sanity checked to see if it solves the 3 cases. Iām sure weāll have many solutions that do, but then at that point we can focus on making those solutions more usable.
I'm a user of PL and I felt the need of callbacks for my usecases. As it seems you need some usecases, mine is: During training, I save intermediate plots to better understand how the training is going, I compute metrics, I save embeddings, etc.
Currently, I edited my LightningModule but I don't like much that way of doing it because it adds a lot of logic that is not related to the model (saving plots and defining a forward step are quite different stuffs) so it's harder and longer to read my code, I'd prefer to have the logic separated.
I also sometimes change my mind and would like to remove one of those added functionalities and to do that I need to remove the code or I found a slightly better way : I created a class "instructions" which stores how much I want to do this additional stuffs (how often in terms of epochs for example) and this object instructions is used to give instructions on when to do things or not. It adds more code though.
I feel like callbacks could help me better separate logics and also I can easily remove a callback instead of removing the code or using an object to deactivate the callback.
thanks for the use case! Iām wondering why you canāt do the logging and embedding plotting from the on_epoch_end?
feel like I have accidentally opened a Pandora box even at the beginning I wanted to merge the two parameters in Trainer init... :robot:
thanks for the use case! Iām wondering why you canāt do the logging and embedding plotting from the on_epoch_end?
I definitely can. But I still have the problem that if I want to disable this code, I have to delete the related code in the on_epoch_end on the LightningModule instead of "just" removing the callback function from the callbacks list in the trainer.
This kind of extended functionalities that callbacks are should not appear in the LightningModule (in my opinion). It also help for sharing the model, I have some colleagues that don't care about my callbacks but are interested in the model. They could remove the on_epoch_end code but it's not a good practice to activate/deactive something by adding/removing code I think.
For example, suppose that the checkpointing logic would not be a parameter in the Trainer but some code in on_epoch_end. It would be a pain to remove or add this code whenever I want to activate/deactivate checkpointing. Instead, it's in the Trainer call so it doesn't disturb the readability of the LightningModule
got it. i see what you mean now. main takeaways from your example:
I guess the key is point 1. It sounds like ideally we want to make sure that your model can still run with or without these callbacks.
So we want to factor DL code into three components?
A. hardware/engineering code (trainer) B. core research logic (LightningModule) C. non-essential callbacks.
This means your model should work with or without C?
My use-cases look exactly the same as @Rodolphe2005 and @williamFalcon you just summarize my main point.
C is non-essential.
Ideally, you would prevent callbacks to modify essential things in A and B but I am not sure this is easily feasible.
awesome. So, I think flavors of this have been proposed in this thread by (@jeremyjordan, @ethanwharris, @hadim). Here's another proposal.
MyCallback(pl.Callback):
def __init__(self, hook_name):
self.hook_name = hook_name
def __call__(self, trainer, pl_module):
# my custom code I want to execute
callback = Callback('on_epoch_end')
Trainer(callbacks=[callback])
Something like this?
What if a callback needs to do things on multiple events such as on_epoch_end
and on_test_end
for example?
Here is a more concrete example:
import datetime
import pathlib
import matplotlib.pyplot as plt
import pandas as pd
from pytorch_lightning.callbacks import Callback
class HistoryLogger(Callback):
def __init__(self, dir_path, plot_history_interval=None):
super().__init__()
self.dir_path = pathlib.Path(dir_path)
self.history_path = self.dir_path / "history.csv"
self.test_path = self.dir_path / "test.csv"
self.plot_history_interval = plot_history_interval
self.history_plot_path = self.dir_path / "history.png"
def on_epoch_end(self):
logs = self._trainer.model.logs
if self.history_path.is_file():
history = pd.read_csv(self.history_path)
else:
history = pd.DataFrame()
history = history.append(pd.DataFrame([logs]), ignore_index=True)
history.to_csv(self.history_path, index=None)
epoch = logs["epoch"]
if self.plot_history_interval and (epoch % self.plot_history_interval) == 0:
self._plot_history(history, self.history_plot_path)
def on_test_end(self):
metrics = {}
metrics["date"] = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
metrics["lr"] = self._trainer.model.get_lr()
metrics.update(self._trainer.model.epoch_metrics["test"])
metrics = pd.DataFrame([metrics])
metrics.to_csv(self.test_path, index=None)
def _plot_history(self, history, plot_history_path):
# Guess metric names
metric_names = history.columns
filter_names_fn = lambda x: x.startswith("val_") and x not in [
"date",
"epoch",
]
metric_names = list(filter(filter_names_fn, metric_names))
metric_names = [name.replace("val_", "") for name in metric_names]
# Plot parameters
basesize = 6
ratio = 0.7
ncols = 3
n_plots = len(metric_names)
nrows = n_plots // ncols
if n_plots % ncols > 0:
nrows += 1
figsize = int(basesize * ncols), int(basesize * ratio * nrows)
fig, axs = plt.subplots(ncols=ncols, nrows=nrows, figsize=figsize, constrained_layout=True) # type: ignore
axs = axs.flatten()
for metric_name, ax in zip(metric_names, axs):
train_metric_name = f"train_{metric_name}"
if train_metric_name in history.columns:
ax.plot(history["epoch"], history[train_metric_name], label="train")
val_metric_name = f"val_{metric_name}"
if val_metric_name in history.columns:
ax.plot(history["epoch"], history[val_metric_name], label="val")
ax.set_xlabel("Epoch")
ax.set_ylabel(metric_name)
title = f"Metric: {metric_name}"
ax.set_title(title, fontsize=18)
ax.legend()
fig.savefig(plot_history_path, dpi=200)
plt.close(fig)
Yeah good point. What if a callback has all the hooks as interface and for each callback you can override the hooks.
MyCallback(pl.Callback):
def on_test_end(self, trainer, pl_module):
# other stuff
def on_epoch_end(self, trainer, pl_module):
# my custom code I want to execute
Good for me.
The question is do we want to pass arguments on those methods or init the callback with something like set_trainer()
as it is done here?
I don't have strong feelings about this.
Yeah good point. What if a callback has all the hooks as interface and for each callback you can override the hooks.
i f you derive it from Callback
as it is in #849 it shall have all events...
Haven't figured out if it's possible yet but what about something like this
MyCallback(pl.Callback):
@on_test_end
def my_function_name(self, trainer, pl_module):
# other stuff
@on_epoch_end
def my_other_function_name(self, trainer, pl_module):
# my custom code I want to execute
@on_epoch_end
def other_thing_to_happen_at_end(self, trainer, pl_module)
# do something else
This would solve some readability issues since the functions can have informative names. Also potentially more flexible if we can support mutliple functions binding to the same callback hook. Although I'm not sure how possible it is haha
itās equivalent to what i wrote. the decorators are unnecessary.
def on_epoch_end: fx1() fx2()
itās equivalent to what i wrote. the decorators are unnecessary.
def on_epoch_end: fx1() fx2()
Good point! I hadn't thought of that :)
Ok, anyone want to take a stab at doing this?
@ethanwharris @hadim @jeremyjordan ?
š Feature
Simplified API, with callbacks... as e.g. Keras did, pass just list of callbacks to be executed and Trainer will call then when needed instead of having them specified https://github.com/PyTorchLightning/pytorch-lightning/blob/b1040523b2180300574d961444b00abfa3c84195/pytorch_lightning/trainer/trainer.py#L65-L66
mentioned also in https://github.com/PyTorchLightning/pytorch-lightning/issues/825#issuecomment-588226411