Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.53k stars 3.39k forks source link

Add `enable_device_summary` flag to disable device printout #13378

Open CompRhys opened 2 years ago

CompRhys commented 2 years ago

🚀 Feature

Add enable_model_summary boolean kwarg to pl.Trainer() to supress _log_device_info()'s output.

Motivation

When calling predict within a surrogate model loop Trainer prints out the devices each time breaking apart intended tables etc or other outputs. Related to https://github.com/Lightning-AI/lightning/issues/13358 for cleaning-up/reducing stdout verbosity.

GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
=======================================================
n_gen |  n_eval |  n_nds  |     eps      |  indicator  
=======================================================
    1 |     322 |       3 |            - |            -
GPU available: False, used: False
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
    2 |    1322 |       4 |  0.625000000 |        ideal

Pitch

Add enable_model_summary kwarg to Trainer that defaults to True

Alternatives

The suggested solution is the simplest solution, any alternative would add more complexity.

Additional context

None


If you enjoy Lightning, check out our other projects! âš¡

cc @borda @awaelchli @ananthsub @rohitgr7 @justusschock @kaushikb11

ananthsub commented 2 years ago

I don't think another flag should be added. Why not move the print out to the Trainer constructor so it's only printed once?

CompRhys commented 2 years ago

I don't think another flag should be added. Why not move the print out to the Trainer constructor so it's only printed once?

It already is. In linked issue (#13358) in order to reduce uncontrollable verbosity I was advised to create a secondary Trainer. There's no need to persist this trainer in the secondary optimisation loop so it gets deleted by the gc and reinitialised when needed.

In general afaik uncontrollable verbosity isn't ideal and in terms of unhelpful verbosity the number of TPUs, HPUs, and IPUs are less likely to be informative than the model summary which there is an option to suppress.

awaelchli commented 2 years ago

I'm curious, is there a desire to have verbosity controlled on a more global level, not just the summary here?

CompRhys commented 2 years ago

I'm curious, is there a desire to have verbosity controlled on a more global level, not just the summary here?

I think that between the enable options for model summary and progress bars (making new trainer when needed to adjust) and the possibility to make things PotentialUserWarnings you can control pretty much anything apart from this device summary.

A verbose=int setup c.f. sklearn could work but would be a much bigger change

CompRhys commented 2 years ago

Happy for me to change the associated PR to be reviewed?

awaelchli commented 2 years ago

@CompRhys thanks for the PR. Since this is adding a flag to the core API in Trainer, we need to discuss it with the core team @Lightning-AI/core-lightning and get some more opinions.

I think there are also a few options we haven't explored yet.

  1. Once could move the device logs to a configurable callback like the model summary or trainer summary.
  2. Let the messages be more easily filtered through logging
  3. Introduce a verbose flag to control messaging through Trainer more generally (e.g. fastdevrun infos)
rohitgr7 commented 2 years ago
  1. Let the messages be more easily filtered through logging
  2. Introduce a verbose flag to control messaging through Trainer more generally (e.g. fastdevrun infos)

these 2 seem better sol. I'd prefer 3 if there are more logs we could configure.

justusschock commented 2 years ago

I'd prefer a combination of 1. and 2.

IMO it really is not necessary to be able to have that baked into the core trainer (same as model summary was not necessary).

And having it more easily filtered would also be great (I tried to forward the streams to something else to just avoid the prints and that also didn't work).

carmocca commented 2 years ago

Today, it can be silenced by doing this:

import logging

def device_info_filter(record):
    return "PU available: " not in record.getMessage()

logging.getLogger("pytorch_lightning.utilities.rank_zero").addFilter(device_info_filter)

I find the callback idea (1) a bit overkill. With (2) we can improve the above, maybe by using a Trainer logger instead of the rank zero logger. (3) seems like it has a larger scope. It would be interesting to see what are your concrete ideas. But for the device info message in particular, we've always agreed that it should be shown and not just when fast_dev_run=True

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, PyTorch Lightning Team!

carmocca commented 2 years ago

I changed my mind. I think the callback proposal is the simplest and most extensible option. This would also resolve https://github.com/Lightning-AI/lightning/issues/11014. And we could have flags in the callback to disable specific prints.

shenoynikhil commented 1 year ago

I think I can take this up!

CompRhys commented 1 year ago

Did anything come from this? My initial PR was never reviewed -- https://github.com/Lightning-AI/lightning/pull/13379

zouharvi commented 1 week ago

~Seems like it made its way upstream? https://lightning.ai/docs/pytorch/stable/common/trainer.html#enable-model-summary~

Apologies, confused it with enable_device_summary. Would make sense to be in the same place though.