PermafrostDiscoveryGateway / viz-staging

PDG Visualization staging pipeline
Apache License 2.0
2 stars 4 forks source link

remove hardcoded logging config for pdgstaging #47

Open mbjones opened 8 months ago

mbjones commented 8 months ago

The current logging configuration in logging_config.py hardcodes some logging configuration, which causes difficult errors in some deployment scenarios. For example, because a logging FileHandler is hardcoded to use the file /tmp/log.log, if multiple users try to use the library on the same machine, they get permissions errors because they can't create the needed logging file on import.

In general, packages and library modules should not configure logging, rather deferring to the main module to configure logging in a way that works for its deployment scenario. This is described in the logging tutorial here: https://docs.python.org/2/howto/logging.html#logging-from-multiple-modules

The only thing that a typical package or module should configure for logging is to set a name for its logger, which by convention is set to the package or module __name__, which allows the logs from each module to be distinguished and controlled independently. This is typically done for the package in the __init__.py file with a statement like:

logger = logging.getLogger(__name__)

Module level loggers should use the same code, but are typicially initialized at the top of the module. See the tutorial for details: https://docs.python.org/2/howto/logging.html#advanced-logging-tutorial

In our case, I think a simple logging scenario would be to:

I will file an identical issue for the viz-raster package, which has the same issues.

For an overview of one approach, see for example: https://stackoverflow.com/a/50751987

julietcohen commented 6 months ago

@mbjones Your third suggested step is "Leave the rest of the logging configuration that is currently in logging_config.py up to the calling application (e.g., viz-workflow)".

The way the visualization workflow currently executes (regardless of whether you are using parsl or ray) is by running a workflow script with a command like python parsl_workflow.py. That workflow script pulls in a viz config and a parsl config if you're using parsl. For example, see the parsl workflow script within the kubernetes & parsl branch of viz-workflow here.

Where do you suggest that the logging configuration that is currently in logging_config.py be specified within viz-workflow so that the logging configuration is passed onto the workflow script? In the __init__.py? Then in the parsl or ray workflow script, pdg_workflow would need to be imported as well?

mbjones commented 6 months ago

To the extent that viz-workflow is a library package, it should configure logging the same way as I described for viz-staging and viz-raster. That said, viz-workflow also contains a python script that runs the __main__ code block for the workflow. It is in __main__ that logging should be configured. I think the best way to do that is to externalize the logging configuration to a config file that gets passed in via a commandline parameter to the __main__ method (typically with some overridable defaults). The logging documentation shows how to use fileConfig() to load a logging configuration file as part of your main() startup method. Overview of the config file is here: https://docs.python.org/3/library/logging.config.html#configuration-file-format

One of the issues we should tackle is separating out the viz-workflowlibrary package from the specific invocation script you are using to launch a specific dataset run (unless the script can be generic enough to support all such runs). We'll really need this separation of concerns when we build out a service backed by the package.

julietcohen commented 6 months ago

The modules in pdgstaging and pdgraster still need to define logger in order to execute all the logger.info() lines throughout the code. At the top of the modules, I had included the following code for the established logging approach:

import logging
from . import logging_config
logger = logging_config.logger

So I will have to define the logger differently since logging_config.py is to be removed from pdgstaging and pdgraster

julietcohen commented 6 months ago

Branches for these changes, named for the issue number in each repo: viz-workflow: bug-38-log viz-staging: bug-47-log viz-raster: bug-27-log

mbjones commented 6 months ago

At the top of the modules, I had included the following code for the established logging approach:

I was suggesting that a common approach to this is to move this log initialization to __init__.py, so that is run during package and module load and doesn't have to be run in each file. Here's what I think probably goes there in __init__.py:

import logging
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())

The only downside to that is that all of the logs are addressed by the package name, so if you want to control logs at the module level, then you would want to initialize the logger (in the same way as described) at the top of each module (still using __name__), which will differ for the module compared to the package. That would allow module-specific log configuration, which can be useful.