ashleve / lightning-hydra-template

PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
4.25k stars 653 forks source link

Unify the redundant running mode #404

Closed genghisun closed 1 year ago

genghisun commented 2 years ago

AFAIK, In Pytorch Lightning, there are 4 running modes including [train/val/test/predict] (or maybe 5? including tune).

But now in lightning-hydra-template, for using these different running modes, different entry files [train.py/val.py/test.py(eval.py)/predict.py] and different tasks(pipelines) [train_task.py/eval_task.py/...] and different config files [train.yaml/eval.yaml/...] need to be specified, which is very redundant. (by default, only training and evaluation are implemented, and other modes need to be implemented by user themselves)

I wonder whether these running modes can be integrated into a unified run.py/main.py entry file, or even unified task(pipeline) and config.

Please feel free to share ideas and how this could be implemented. If possible, I would be happy to contribute.

ashleve commented 2 years ago

A while ago the template had only one task with run.py and config.yaml and there were multiple issues raised about how to split it into several tasks: https://github.com/ashleve/lightning-hydra-template/issues/151 https://github.com/ashleve/lightning-hydra-template/issues/200 https://github.com/ashleve/lightning-hydra-template/issues/103

I don't think unifying everything into a single task is the right approach since you often need something more sophisticated. For example, you might want to:

The current setup is just an example showcasing how to manage multiple tasks and configs. You can easily delete the evaluation task if you don't need it (train_task.py already includes optional evaluation).

I don't like dumping every task into a single config - this makes them hard to read and manage since your configs might need to have a different structure for each task or use different types of modules.

As to having a single start file for all tasks, e.g. main.py - we could probably do something like this:

python main.py task=train

Then you would need to do extra nesting for each command line override:

python main.py task=train task.datamodule.batch_size=64 task.model.optimizer.lr=0.01

Not very convenient in my opinion - you need to specify task=... and write longer overrides every time you want to launch something.

These would be my arguments for doing it the current way, feel free to share your thoughts.

genghisun commented 2 years ago

Thanks for your such detailed answer!

I strongly agree that we should not unifying all the tasks. There should be various tasks and corresponding configs for different complicated tasks such as those you listed.

But I still think there is a lot of redundancy in [entry files, task files and config files] for routine tasks such as simple training and evaluation. For example, what different entry files do is to call the corresponding task. And what different tasks (train/eval) do are just instantiate datamodule, model, logger, trainer, etc. through very similar configuration files, and then call the corresponding methods of trainer.

So I suggest unifying entry files for these regular training and evaluation tasks, which can be called main.py or trainer.py. As for the extra nesting problem, I think it can be solved with the subcommand like this:

python main.py fit(train) datamodule.batch_size=64 model.optimizer.lr=0.01
python main.py test datamodule.batch_size=64 model.optimizer.lr=0.01
python main.py build_dataset path=xxx

Thus the fit(train) and test can share same task and config, without extra longer overrides. As another specific scene, build_dataset can use different configs and arguments than regular training and evaluation tasks.

In terms of implementation, when the subcommand is in [fit(train)/validate/test/predict/...], the trainer_task.py is called, and when the subcommand is other, the specific task is called by the name of subcommand, like build_dataset_task.py.

ashleve commented 2 years ago

@genghisun Could you elaborate on how such subcommend could be implemented? I'm not 100% sure but I don't think hydra has such feature and it's not possible to use normal argparse alongside hydra? Is there a workaround you have in mind?

I agree entry file and task file could be merged into one. The reasons they're not are:

Separating hydra logic (entry file) from task logic seemed cleaner to me, but that's mostly just my opinion and I'm open to contributions on this part.