Open mbjones opened 8 months ago
@mbjones Your third suggested step is "Leave the rest of the logging configuration that is currently in logging_config.py up to the calling application (e.g., viz-workflow)".
The way the visualization workflow currently executes (regardless of whether you are using parsl or ray) is by running a workflow script with a command like python parsl_workflow.py
. That workflow script pulls in a viz config and a parsl config if you're using parsl. For example, see the parsl workflow script within the kubernetes & parsl branch of viz-workflow
here.
Where do you suggest that the logging configuration that is currently in logging_config.py
be specified within viz-workflow
so that the logging configuration is passed onto the workflow script? In the __init__.py
? Then in the parsl or ray workflow script, pdg_workflow
would need to be imported as well?
To the extent that viz-workflow is a library package, it should configure logging the same way as I described for viz-staging and viz-raster. That said, viz-workflow also contains a python script that runs the __main__
code block for the workflow. It is in __main__
that logging should be configured. I think the best way to do that is to externalize the logging configuration to a config file that gets passed in via a commandline parameter to the __main__
method (typically with some overridable defaults). The logging documentation shows how to use fileConfig() to load a logging configuration file as part of your main() startup method. Overview of the config file is here: https://docs.python.org/3/library/logging.config.html#configuration-file-format
One of the issues we should tackle is separating out the viz-workflow
library package from the specific invocation script you are using to launch a specific dataset run (unless the script can be generic enough to support all such runs). We'll really need this separation of concerns when we build out a service backed by the package.
The modules in pdgstaging
and pdgraster
still need to define logger
in order to execute all the logger.info()
lines throughout the code. At the top of the modules, I had included the following code for the established logging approach:
import logging
from . import logging_config
logger = logging_config.logger
So I will have to define the logger differently since logging_config.py
is to be removed from pdgstaging
and pdgraster
Branches for these changes, named for the issue number in each repo: viz-workflow: bug-38-log viz-staging: bug-47-log viz-raster: bug-27-log
At the top of the modules, I had included the following code for the established logging approach:
I was suggesting that a common approach to this is to move this log initialization to __init__.py
, so that is run during package and module load and doesn't have to be run in each file. Here's what I think probably goes there in __init__.py
:
import logging
logger = logging.getLogger(__name__)
logger.addHandler(logging.NullHandler())
The only downside to that is that all of the logs are addressed by the package name, so if you want to control logs at the module level, then you would want to initialize the logger (in the same way as described) at the top of each module (still using __name__
), which will differ for the module compared to the package. That would allow module-specific log configuration, which can be useful.
The current logging configuration in
logging_config.py
hardcodes some logging configuration, which causes difficult errors in some deployment scenarios. For example, because a logging FileHandler is hardcoded to use the file/tmp/log.log
, if multiple users try to use the library on the same machine, they get permissions errors because they can't create the needed logging file on import.In general, packages and library modules should not configure logging, rather deferring to the main module to configure logging in a way that works for its deployment scenario. This is described in the logging tutorial here: https://docs.python.org/2/howto/logging.html#logging-from-multiple-modules
The only thing that a typical package or module should configure for logging is to set a name for its logger, which by convention is set to the package or module
__name__
, which allows the logs from each module to be distinguished and controlled independently. This is typically done for the package in the__init__.py
file with a statement like:Module level loggers should use the same code, but are typicially initialized at the top of the module. See the tutorial for details: https://docs.python.org/2/howto/logging.html#advanced-logging-tutorial
In our case, I think a simple logging scenario would be to:
logger = logging.getLogger(__name__)
to the package__init__.py
I will file an identical issue for the viz-raster package, which has the same issues.
For an overview of one approach, see for example: https://stackoverflow.com/a/50751987