allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
314 stars 42 forks source link

ir_datasets uses the root logger #124

Closed cash closed 2 years ago

cash commented 2 years ago

Describe the bug The logging code in log.py is written to use a name, but all uses of it pass None so that means ir_datasets modifies the root logger. This is fine if using ir_datasets as a command line script, but as a library, it is annoying. The end result is that all logged messages after a data set is downloaded are duplicated to the console if there is already a handler registered.

Possible fix Changing the default logger name fixes the issue.

Change:

class Logger:
    def __init__(self, name):
        self.name = name
        self._logger = None

to

class Logger:
    def __init__(self, name):
        self.name = name if name else "ir_datasets"
        self._logger = None
seanmacavaney commented 2 years ago

Thanks for reporting! I believe the .easy() function was meant to set the name automatically (properly prefixed with the package), akin to the function it was borrowed from in onir, like so:

def easy(name=None):
    """
    Returns a logger with the caller's __name__
    """
    if name:
        return Logger(name)
    import inspect
    try:
        frame = inspect.stack()[1] # caller
        module = inspect.getmodule(frame[0])
        return Logger(module.__name__)
    except IndexError:
        return Logger('ir_datasets')

I don't remember how/why this was lost, but would this also address the bug? (Maybe in conjunction with the change above so one couldn't accidentally create a root Logger?)

I'd want to do a little benchmarking to make sure the above code does not have an adverse effect on the time it takes to load the module though (particularly important when it's used as a command line tool).

cash commented 2 years ago

As long as the easy() convenience function is always used when constructing a Logger, your above change will fix this issue. I did a quick search and it looks like a Logger object is never constructed directly in ir_datasets.

seanmacavaney commented 2 years ago

I got a little nervous about the performance implications of inspection, so I just used your suggested change instead. Thanks again for reporting!