bloomonkey / oai-harvest

Python package for harvesting records from OAI-PMH provider(s).
Other
62 stars 41 forks source link

logfile conflict - no harvest.log #26

Open sdm7g opened 5 years ago

sdm7g commented 5 years ago

There seems to be a conflict between logging config at https://github.com/bloomonkey/oai-harvest/blob/develop/oaiharvest/harvest.py#L500-L505 and https://github.com/bloomonkey/oai-harvest/blob/develop/oaiharvest/registry.py#L340-L346 .

Harvest logs are always written to registry.log. There is no harvest.log. Not much of an issue, but caused a little confusion when looking for logs.

( I am going to run harvest from a cron job, so I'm going to want to redirect both the logs and the registry into another directory anyway. )

sdm7g commented 5 years ago

From logging-basic-tutorial

The call to basicConfig() should come before any calls to debug(), info() etc. As it’s intended as a one-off simple configuration facility, only the first call will actually do anything: subsequent calls are effectively no-ops.

OK: that explains why the last config (harvest) doesn't override the first (registry).

sdm7g commented 5 years ago

I'm not sure what was intended with that code: I assume that oai-reg actions should go to registry.log, and oat-harvest actions to harvest.log. If that is the case, and you're going to use basicConfig, then it might make more sense to make the basicConfig set up the console/streamHandler/root logger, with the same settings in both files, and setting up harvest.log and registry.log as children.

I've tried the following, along with similar changes to registry.py, and I think that works to send the registry or harvester specific messages to the appropriate named logger, while everything goes to stderr root logger. ( But perhaps some of that setting should be done in each main() function. )

# Set up logger
logging.basicConfig(
    level=logging.DEBUG,
    format='%(levelname)-8s %(message)s',
    # format='%(asctime)s %(name)-16s %(levelname)-8s %(message)s',
    # datefmt='[%Y-%m-%d %H:%M:%S]',
    # filename=os.path.join(appdir, 'harvest.log')
    )

#ch = logging.StreamHandler()
ch = logging.FileHandler( os.path.join( appdir, 'harvest.log'))
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter(
    '%(asctime)s %(name)-16s %(levelname)-8s %(message)s',
    '[%Y-%m-%d %H:%M:%S]')
#formatter = logging.Formatter('%(levelname)-8s %(message)s')
ch.setFormatter(formatter)
logging.getLogger(__name__).addHandler(ch)

I'm trying to get this working properly, as:

  1. I would like to redirect log files elsewhere, and since existing code doesn't do what it would seem to do, that is confusing.
  2. I'm trying to extend pyoai client to use recover=True parser option, but I want parser errors to be logged to both console & file, and that was also confusing with existing code.

Currently, I'm using the following in harvest.py, but I should probably create a child logger as oaiharvest.harvest.XMLParser to tag those log entries differently.

from lxml import etree
etree.use_global_python_log(etree.PyErrorLog(logger_name=__name__))

I would be happy to submit a push if you think I'm on the right track with that solution.