[WIP] Flexible and convenient logging

c42f commented 7 years ago

As discussed in https://github.com/JuliaLang/julia/issues/13515, there's still some desire to iterate on julia's logging interface to make it capable of supporting production needs.

So far, I've tried to write down some desired features, without focusing on a particular implementation strategy, though certainly informed by a prototype which covers most of what is mentioned here.

This is still very much a work in progress, but I'm posting it here for early feedback.

StefanKarpinski commented 7 years ago

Excellent start – can't wait to see more!

StefanKarpinski commented 7 years ago

Should we merge this?

rofinn commented 7 years ago

Sorry, I forgot to comment on this. FWIW, I think starting with a really simple AbstractLogger interface and leaving the more opinionated design choices (like log record types, filters, etc) up to libraries to handle would be a prudent choice.

StefanKarpinski commented 7 years ago

I'd like to have a minimal set of standard log record types that should work everywhere at least, but they could certainly be subtyped/extended by more opinionated packages.

c42f commented 7 years ago

To give you an update on where my thinking is going with this one -

What I want out of this Julep is to standardize an easy to use frontend interface which will be used by package authors. To me this means specifying a minimum of syntax and documenting some clear conventions for structured log records. After a lot of prototyping, I've now got a rather concrete proposal to cover these design requirements. I'm happy to update the current PR with these and we can get into discussing the detail.

I don't intend to design a full logging backend library with complete functionality - this can be further explored in logging packages outside of Base. However, I want the interface to be sufficiently general to support such packages and I've been trying to anticipate a bunch of the requirements they'll have by examining other logging libraries and checking that the same basic functionality is supported in the prototype.

c42f commented 7 years ago

As to whether this should be merged, I suppose I somewhat prefer to continue in this PR until I've got the concrete design written down.

c42f commented 7 years ago

@rofinn agreed that we should design an abstract logger interface here (though I'm not yet sure we actually need an AbstractLogger type).

@StefanKarpinski I was coming to the conclusion that a log record is:

A log level, taken from the name of the logging macro
A message (of any type?)
An optional set of key value pairs
Metadata extracted automatically from the call site (line number, file name, module, unique message id, etc)

This allows for a nice minimum of syntax at the use site

@info "A message"

@warn "Something bad happened" my_var=100

@info "Algorithm A" progress=i/100

@info "You only see this twice" max_log=2

I was hoping to document some particularly useful keys as having a specific meaning, for instance progress reporting via the progress keyword.

ChrisRackauckas commented 7 years ago

+💯 for a standardized progress setup. It would be really nice if this was standardized, because as it stands right now you have to implement multiple packages to use progress tracking differently (ProgressMeter.jl, progress in Juno, etc.), so having a single one that they are supposed to extend off of would be nice. But that's sufficiently complicated to be its own discussion, see

https://github.com/JunoLab/Juno.jl/issues/29

and you'd want to contact @pfitzseb and see what would be required for it to at least handle what Juno needs, otherwise I think it would be ignored.

MikeInnes commented 7 years ago

Would be great to make this pluggable like the display system is. It'd be a huge boon to have things like a UI that collates deprecation warnings and such, especially if it can be extended to user logging as well.

c42f commented 7 years ago

@ChrisRackauckas @MikeInnes exactly what I was thinking! I want programs like Juno to be able to suck in logs and display them however seems most appropriate. When a progress keyword is included, this would naturally be a progress bar. My general plan is that programs like Juno should install a global logging backend to intercept all logs and do what they see fit with them.

I've already played with making ProgressMeter.@showprogress use MicroLogging as a backend, and it seems to work quite nicely.

ChrisRackauckas commented 7 years ago

I've already played with making ProgressMeter.@progress use MicroLogging as a backend, and it seems to work quite nicely.

But does it do the layered loops and layered progress monitoring? That's what I am asking. It's nice to have a basic system which everything extends off of, but if it doesn't actually capture the functionality that already exists and is useful, it would be tough for things to change and use it.

c42f commented 7 years ago

But does it do the layered loops and layered progress monitoring?

Unlikely, as I didn't know this existed - but that makes this the best kind of question :-) I assume you refer to https://github.com/JunoLab/Juno.jl/pull/19, I'll have a look.

MikeInnes commented 7 years ago

It handles concurrency, nested loops etc, so it's pretty sweet. We'd like to extend it so that you can cancel tasks as well.

It does seem like this could be unified with logging infrastructure somehow. You could do something like l = Logger() per task, sending and possibly receiving arbitrary messages. Progress is then just a subset of that, and on the receiving end you can do things like differentiate between parallel tasks.

c42f commented 7 years ago

The Juno composable progress stuff looks great. I think MicroLogging would already support being a transport layer for this (it's got arbitrary log message types and arbitrary key value pairs), but maybe not in a really natural way. The main thing seems to be making sure there's a way to communicate that the progress step is finished (the done() function) so the progress step can be removed from the stack.

I've put a bit of thought into how the logging stuff would work with tasks. As it exists so far, I've decided that dynamic scoping via the task local storage is the best way to look up the current logger (falling back to a global logger). This is quite a departure from logging libraries in most languages which usually associate loggers with modules or classes. However, dynamically scoped logging appears to have some interesting advantages, especially for multithreaded code and capturing logs while unit testing. Example:

using MicroLogging

@info "This goes to the global logger"

with_logger(MyLogger()) do
    @info "This goes to the MyLogger instance which is looked up via task local storage"
    foo() # Any log messages in here also go to the MyLogger instance
end

MikeInnes commented 7 years ago

Yes, you'd have to treat the logger as something like a channel and close it when done.

I don't think the object-y approach is necessarily incompatible with the dynamically scoped version (which does look useful). If the lowest level API looks like:

l = Logger()
@info l "foo"
close(l)

You could easily build with_logger on top of that, as well as having @info fall back to TLS with one argument etc. That gives you a ton of flexibility. In Juno we could build @progress for on top of that api as well.

c42f commented 7 years ago

The object-y version was exactly what I had early on, but I removed it as an "unnecessary" feature when I discovered the TLS approach :-) The reasoning being that package authors generally won't want and shouldn't be required to thread the logger context through their function calls.

I could add it back in again without any difficulty. Do you have a compelling use case in mind?

c42f commented 7 years ago

As it exists so far, I've decided that dynamic scoping via the task local storage is the best way to look up the current logger. This is quite a departure from logging libraries in most languages which usually associate loggers with modules or classes

To expand - the TLS is used for finding the initial entry point for log dispatch. However, the module is captured as part of the log record metadata so further dispatch could redirect to a more traditional module-based logger.

rofinn commented 7 years ago

Also, how would hierarchical logging fit into this API? This seems like something folks would want and perhaps it can mostly be left up to libraries like MiniLogging.jl and Memento.jl, but it might useful to consider this functionality when designing the API.

c42f commented 7 years ago

Yes, you'd have to treat the logger as something like a channel and close it when done.

@MikeInnes Actually, I'm not quite sure I understood your use case very well. Is it that you want to turn

@progress "name" for x in range
   body
end

into something like

p = ProgressLogger()
try
    for (i,x) in enumerate(range)
        body
        @info p "name" progress=i/length(range)
    end
finally
    done(p)
end

? Actually having written this, it doesn't make a lot of sense to me (generally packages shouldn't be messing with loggers unless they're in the outermost scope, otherwise they'll prevent the logger from being injected by the user) so I think I'm still confused. Here's what I'm thinking might make sense:

try
    id = $(gensym())
    for (i,x) in enumerate(range)
        body
        @info "name" progress=i/length(range) id=id
    end
finally
    @info "name" progress=1.0 id=id  # Or possibly progress_done=true ?
end

with the assumption that achieving a progress of 1.0 terminates the step. Possibly you'd want the gensym'd message id to be overridden if you didn't want to key the progress step off "name". (All log messages generated by the lower level MicroLogging.@logmsg have a unique message id as a key for things like log limiting.)

c42f commented 7 years ago

how would hierarchical logging fit into this API

@rofinn Good question! Here's where I need your advice on how you're using this in production, because while I see a lot of other libraries implement it, I admit I don't really get why this is so useful. I'm worried about that, and I'd like someone to explain it to me :-) Especially with a handful of use cases so I can understand whether it's a necessity, or whether there are other equally good ways of achieving the same end goal.

c42f commented 7 years ago

By the way, I did start out with something resembling hierarchical logging (following python, which I understand follows log4j1). But things diverged from there into the TLS based system as I tried to understand how you can solve the problem of controlling logging in the multithreaded case, while also avoiding the burden of package authors keeping track of the log context. The authors of log4j2 just use loggers in a global hierarchy, and handle threading issues by inserting a piece of TLS (the "fish tagging" approach https://logging.apache.org/log4j/2.x/manual/thread-context.html). I went a step further, and put the logger itself in the TLS...

Anyway, which features of the hierarchy do you use, and how do you use them?

rofinn commented 7 years ago

Typically, I use hierarchical logging to add extra logging behaviour for specific modules. For example, if I'm interacting with 3rd party services like aws I find it convenient to add an extra handler for writing to a separate file or pushing to a log server for later audit. The nice part of it being hierarchical is that I can have a bunch of fine grained loggers floating around, but I only need to configure the child loggers if I want additional behaviour (on top of what I've configured in the root logger for my application). I've also occasionally set the child logger to not propagate messages if I want to tightly control the behaviour of log messages within a given module (this isn't very often though).

c42f commented 7 years ago

Thanks @rofinn, the hierarchical thing definitely makes sense for controlling filtering, formatting and dispatch for groups of log messages. I had it in my head that there's more to it than that, but perhaps not.

My current plan was that a logging backend could deal with each record by looking up whatever filter/format/dispatch rules using the log metadata. A hierarchy is one convenient way to do this, but I'm not sure we should bake it into the system - sometimes I think it'd be useful to have rules which step outside the hierarchy. As I've implemented it so far, the relevant metadata for "rule matching" are

log level
Module
filename / group
line number / id (roughly increasing order of fine-grainedness)

Even a minimal backend in Base will need some kind of configuration system (eg, for depwarn control independent of other warnings).

The idea of "log group" is a speculative user-defined grouping based on a conventional group keyword. It might look something like

function foo()
    @warn "You shouldn't use this, use bar() instead" group=:deprecated
end

rofinn commented 7 years ago

Hmmm, okay. Sorry, I'm still confused how a backend that provides hierarchical logging would plugin to this API though.

Let's say I configure my application with:

# I'm gonna use Memento syntax just cause that's what I know pretty well
using Memento

# Configure info logging for the entire app with info level logging and only a stdout handler
Memento.config("info")

# Get an "aws" logger and specifically setup a handler using syslog for it.
aws_logger = get_logger("AWS")
add_handler(aws_logger, DefaultHandler(Syslog(:local0, "julia"), DefaultFormatter("{level}: {msg}"))

And then later on in the AWS module I have:

logger = get_logger("AWS")
info(logger, "Brought up a new instance")

I'm guessing the former would be specific to the backend and the latter would equate to @info "Brought up a new instance" group=:AWS, but I'm unclear how this result in executing the handler for syslog only when group=:AWS. Would you have a global list of filters to iterate over or would it call Backend.log(record) and leave it up to the backend to call info(get_logger(record.group), "Brought up a new instance")?

c42f commented 7 years ago

Nice, let's try to figure out how that concrete example would work (figuring out how this fits in with hierarchical backends was one of the next things on my list). I'll try to create a shim to make Memento a backend for MicroLogging, and we can see how it looks.

By the way I'm really happy to have all these good questions about how the design will work for various use cases - keep them coming!

c42f commented 7 years ago

@rofinn ok, as a very rough WIP, how about this: https://github.com/c42f/MicroLogging.jl/blob/master/examples/memento.jl

rofinn commented 7 years ago

@c42f Awesome, thanks for taking the time to work though that example. Yeah, it might be nice to have a distinct type hierarchy and API for Backends and Loggers to avoid confusion, but otherwise I really like that the example doesn't require a lot of changes to the application code.

pfitzseb commented 7 years ago

@c42f Regarding the progress stuff: As far as I can tell, having a unique ID for each message is fine -- there's no need for an explicit ProgressLogger() like we use in Juno. All in all, this looks pretty good to me -- having a logging/messaging frontend in Base that supports custom backends is awesome, especially if it exposes metadata and stuff like that! +💯

c42f commented 7 years ago

it might be nice to have a distinct type hierarchy and API for Backends and Loggers to avoid confusion

@rofinn Yes, we have a bit of a terminology clash here. Personally I think the term "logger" is appropriate for the first piece of user-configurable code which has the opportunity to process a log record. This is the same way that Memento uses the term, but in the design above Memento.Logger is now plugged in at a slightly later stage. What we're doing here is clarifying and disentangling the two roles that Memento.Logger currently has:

A front end interface to directly receive user messages
A hierarchal filtering and dispatch strategy for log records

As for the second point here, I think an explicit hierarchy of objects is just one way to do this. I'm thinking of the set of log records as a database table (columns being required/optional metadata - module, file name, etc - and each row being a record). A logger hierarchy can then be seen as a particular way of arranging and optimizing queries against this table, but it's not hard to imagine more general queries which don't fit into this scheme.

@pfitzseb Excellent :-) If you do have any time to experiment with the prototype implementation - especially in the context of Juno - I'd greatly appreciate feedback on things which are and aren't working for you.

cdluminate commented 7 years ago

Can't wait to use it :-)

StefanKarpinski commented 7 years ago

Is this ready enough to be merged? It can remain WIP and get more updates and feedback, of course.

c42f commented 6 years ago

@StefanKarpinski excellent question, sorry for the long delay. I pushed a refactor of my thoughts, and I'm happy for this to be merged now.

Despite appearances, the project isn't abandoned - I've definitely been thinking about this, writing code, and trying to use it, both in the prototype, and now tentatively in Base over at https://github.com/JuliaLang/julia/pull/23712 .

JuliaLang / Juleps

[WIP] Flexible and convenient logging #30