crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.5k stars 1.62k forks source link

Proposal: improve stdlib logger #5874

Closed ezrast closed 4 years ago

ezrast commented 6 years ago

This is piggybacking off of some chat discussion by @RX14 and @straight-shoota earlier today.

Currently there is no great way for library authors to log without friction. A library can't just write to STDERR arbitrarily because that's a potentially unwanted side effect. It instead has to take an optional Logger? that is nil by default, which requires boilerplate and is unlikely to actually be used.

If we instead establish a class-based logging hierarchy into the stdlib, library code can log unconditionally, with downstream users selectively enabling logs from the libraries they care about. I propose improving the situation by making the following changes:

Structure loggers hierarchically, with granular level controls

A Logger should be able to take a parent, to whom the actual log writing is delegated. By default, a child Logger's level should be nil, which means it inherits the level of its parent.

main_logger = Logger.new(STDOUT)
child = Logger.new(main_logger)
noisy_child = Logger.new(main_logger)

main_logger.level = Logger::WARN
noisy_child.level = Logger::DEBUG

child.info{ "This logger uses its parent's log level and this message does not get printed" }
noisy_child.info{ "This message will get printed because noisy_child has its own log level set" }

Make logging class-aware by default

It should be trivial to define a logger that is associated with a particular class or module. That logger will inherit from the logger of its parent in the module namespace hierarchy. Parents can be created automagically.

module MyProgram
  class App
    include Loggable # creates a Logger and adds helper methods e.g. `warn`, `info`
    def initialize
      info{ "creating an app" }
    end
  end
end

# Sets the effective log level for every Loggable under the MyProgram namespace.
# Note that the Logger for MyProgram got created behind the scenes at the same time as the Logger for App
Loggable.get_logger(MyProgram).level = Logger::DEBUG

# Logs the following (note the class name is included): 
# I, [2018-03-26 14:43:17 -07:00 #25575] INFO -- MyProgram::App: creating an app
app = MyProgram::App.new

Loggable will have a root level logger that by default does not log anything, so library code will stay quiet. To up the verbosity, just do:

Loggable.get_root_logger.level = Logger::DEBUG  # Enables logging for every class and module in your program, including libraries.

Unrelated to the above points, Loggers currently only operate on IO objects. Modern applications frequently eschew traditional log files in favor of structured logging, so this behavior should be more customizeable. Thus, I also propose to

Abstract Logger away from IO

Instead of storing an IO directly, Loggers should store a LogWriter, which is a module with abstract def write on it. The standard library should include an IOLogWriter class that takes on the formatting behavior of the current Logger, but is interchangeable with custom-defined classes like so:

class GelfLogWriter
  include LogWriter

  def initialize(@transport, @host, @port: 12201)
  end

  def write(severity, datetime, progname, message)
    # build the document and send it over the network here...
  end
end

Loggable.get_root_logger.handler = (GelfLogWriter.new :udp, "graylog.example.com")

I'll be happy to work on implementing these changes if the core team thinks they'd be useful.

straight-shoota commented 6 years ago

The biggest selling point has not even been mentioned: class-aware loggers should also be hierarchical, meaning you can configure logger settings for MyProgram which also apply to enclosed namespaces like MyProgram::App unless the setting is overwritten somewhere down the hierarchy. This allows for very fine grained control of the log level for individual components.

It would be great to have this integrated in the stdlib so every shard can and should use it as a single logging solution which integrates with every Crystal library.

Just a note: I don't think a Loggable module makes much sense. Log methods should not pollute the instance namespace but use the logger as explicit recipient. Then it comes down to essentially LOGGER = Logger.new(self.class) or something similar in the class scope, which really doesn't need to be extracted in a module.

ysbaddaden commented 6 years ago
  1. How does encapsulating loggers solve anything?

For example, this requires a child logger to know the parent app or library, but that's impossible unless the parent configures the child... why bother with encapsulation, then, instead of directly passing a logger instance?

  1. IMHO a better approach would be to introduce a global Logger actor. It would unning on its own fiber, and be configured once for the whole application (level, filters, output adapter such as IO, syslog, ...). Each library could then send it messages to log, with a library identifier to filter.

    That doesn't solve the hierarchical issue, but it does solve the proliferation of having many logger instances.

RX14 commented 6 years ago

I think a better solution is to have many loggers, but centralised configuration. Each logger will know what namespace it's at (::HTTP::Client for example), then when you change configuration, the central logging configurator will apply rules based on starts_with?, ordered by priority according to size. For example ::.log_level will configure every logger (every logger's namespace starts with ::) and ::HTTP.log_level will configure just loggers in the stdlib's HTTP library. The ::HTTP configuration will always override the :: configuration because it's a longer string.

ezrast commented 6 years ago

For example, this requires a child logger to know the parent app or library, but that's impossible unless the parent configures the child...

If you index loggers based on class name (as given by self.class.name), you can find a parent by splitting on "::" and dropping the last element.

My reasoning for using Loggable (which needs a better name but I'll go with it for the moment) was that the system really benefits from widespread adoption, and including a module is the most low-friction, crystal-idiomatic way to indicate that a class should do a thing. LOGGER = Logger.new(self.class) is less atomic than include Loggable. Without knowing anything else about the system, the latter says to me "this is the correct way to do loggy things" more than the former.

@RX14, I don't understand what that offers over what I or @ysbaddaden have suggested. Is the configurator setting log levels on each logger one by one? If I do ::Foo::Bar::LOGGER.level = Logger::INFO and then LoggerConfigurator.set_level(::Foo, Logger::WARN), what happens?

straight-shoota commented 6 years ago

@ysbaddaden I agree logging should be global. Local Logger instances would just be thin wrappers and delegate to the main logger, decorated with the current type/logging scope.

The design described by @RX14 would probably work out pretty similar and it seems to me that it would merely be an implementation detail whether distributing log messages to subscribed handlers happens local with global config or global.

@ezrast Well, it could also be a macro (for example def_logger or whatever) instead of include Loggable. My point is that there is no need for having a module for nothing more than defining a single class constant. And IMO it shouldn't need anymore else. That's enough for Java's excellent logging system, why would we need anything more?

ysbaddaden commented 6 years ago

I think a global / centralized Logger in Crystal makes little sense. Unlike Elixir, Crystal doesn't provide solutions to structure services in an application.

Having scattered logger instances is a pain, and the solutions proposed here are really nice, but they assume a particular way to design and model an application (namespaces, hierarchical class names, central configuration). Again: Crystal's stdlib doesn't provide nor encourages anything more than spawning fibers and communicating through channels. I'm afraid assuming more in Logger will make it a pain to use in different contexts. Assuming namespaces / class names are hierarchical and meaningful out of any context is very weird.

I believe it's a framework job to come with solutions for this. See for example Earl::Logger which assumes a lot about the program, but Earl's job is precisely to structure apps.

That being said Logger must evolve. For example we could support and thus allow the configuration of:

j8r commented 6 years ago

@ysbaddaden You're quite right if we build a big web application, but for smaller ones like a system tool/API server aLogger will be very welcome. A framework for this will be overkill in this case, maybe a shard but using directly the stdlib's logger would be nice!

straight-shoota commented 6 years ago

It's true that Crystal doesn't rely on strictly hierarchical namespaces for library components. But I don't think that necessary limits the usage of a hierarchical logging system. The logger namespaces don't even have to correlate to Crystal namespaces. It doesn't have to assume anything. It just provides a simple interface that is capable of logging hierarchically. If and how this feature is adapted depends on how libraries and apps use it. For shards it is common and encouraged to put library code in a namespace. That would be at least some form of namespacing used in Crystal.

ysbaddaden commented 6 years ago

@j8r Application size is irrelevant. Logger should be generic enough to be useful in any environment.

@straight-shoota I'm reading on java.util.logging and I'm revising my judgement. There is a single Logger from which you can request a logger instance, reating a hierarchy; namespaces are any dot separated string; you can configure loggers from anywhere; it doesn't assume much about application design. That looks good.

bcardiff commented 6 years ago

WDYT about defining just the interface of the logger and leaving configuration / instantiation to the framework?

That way the shards and components could agree on how to send info to the logger.

It would be comfortable to be able to pass around the instance of a logger from a parent to a child so for example an http server can pass the loggers to the handlers.

It would be nice to support some kind of namespace/activity logging configuration. And allow them to be specified while calling the logger so the instance can be shared between parent and child.

So, from the current log(severity, message, progname) we could introduce a context that I imagine it will match usually the namespace/class of the caller.

For tracking the activity I am not sure if should lay in the stdlib. Let me show an example to ilustrate more: let's assume there is an app with and api to register a user. The user registration involves some db statement and some call to external services. If we want to control the user registration is not enough to state the the db namespace or the http api namespace should be logged. So the code base needs to say that the activity been done is user registration. How this information is passed around is either explicit in every call (IMO ugly) or the responsibility is of the framework + logging system used. If we took this later approach, then there is nothing to do regarding the tracking of activity, just to not block the possibility.

So my proposal for the stdlib is to make a module that will define the interface of logging with an additional context. Have a basic STDOUT implementation that only obey severity, and allow logging framework shards to play nice with that idiom (implementing filters by context).

ezrast commented 6 years ago

@bcardiff, Have you looked at https://github.com/crystal-lang/crystal/pull/5921? Over there the interaction with IO is split out into an IOAdapter class that can easily be swapped out to provide arbitrary logging behaviors, and the Handler class can be improved to allow filtering above and beyond the standard log level threshold. I think it basically lines up with what you're saying, except that Handler provides a default way of filtering selectively based on context. Are you saying that you'd rather have this functionality omitted from the stdlib, or extracted into a concrete implementation of a more basic/abstract handler class?

bcardiff commented 6 years ago

@ezrast there I checked the PR. I do meant to leave the stdlib as plain as possible without assumptions of the lifecycle / locking of loggers. I would vouch for

More or less this is cover in https://gist.github.com/bcardiff/c5126a9c19fe36987819bc6e86ea7d3f so we can check ideas in actual code.

I would leave more advanced configuration options and loggers architecture out of the stdlib but compliant with the interface exposed in the gist.

RX14 commented 6 years ago

@bcardiff regarding your gist, how would you centrally configure all the outputs for every logger? I also don't like the design because it uses macros and introduces methods into the classes which want to use loggers.

My ideal logging interface would be:

class Foo::Bar
  @log = Logger.get(self.class) # (same as Logger.get("::Foo::Bar") - the :: could be replaced by . for easy parsing)

  def do_something
    @log.trace "doing something"
  end
end

I'm think that loggers should have "one way to do things". Passing around a logger instance, each instance having a namespace which by default inherits it's formatter and loglevel from it's parent, sounds flexible enough to me. And if java's had good success with the same model, why not?

Configuring loggers then becomes easy like:

Logger.get("::").level = Logger::INFO
Logger.get("::").handler = Logger::IOHandler.new(STDERR) do |severity, datetime, progname, message, io|
    label = severity.unknown? ? "ANY" : severity.to_s
    io << label[0] << ", [" << datetime << " #" << Process.pid << "] "
    io << label.rjust(5) << " -- " << progname << ": " << message
  end
Logger.get("::HTTP::Server").level = Logger::DEBUG

The first two lines would be the default of course.

bcardiff commented 6 years ago

I won't expect by default that the loggers have a logger factory pero consumer class. But rather pass the instance from parent to child component. Whether that instance is the logger itself or if it instantiate objects in the middle it's up to the logger implementation.

But if the Logger.get(self.class) or other global registry is used, then the user won't be able to override the logger per component instance.

In the gist the first parameter of Logger.using "app.foo" would have a default value mangling the namespace of @type, definitely. So if no the component has dependencies onlye a Logger.using would be needed.

I like how calling debug info as protected method keeps the code clean, but is a personal opinion. I could go with log.debug, but in order to do that, in my proposed design where logger instances is shared, the instance returned by log (in log.debug) should be a struct that has the context information for component where it is been called. I would try to make a gist with that option to illustrate if polluting the object protected methods is a strong no.

There are many opinions on how is better to configure the logging. I think we can leave that for the framework / app, and decouple.

RX14 commented 6 years ago

the user won't be able to override the logger per component instance

what do you mean?

In Java's scheme you never need to "override a logger", since the Logger class becomes a shell, delegating to the handler, which contains all the logic.

the instance returned by log (in log.debug) should be a struct that has the context information for component where it is been called

It seems we've arrived at essentially the same design - except that your context struct is renamed to Logger, is mutable, and that the instances come from a registry. The instances coming from a registry allow for more flexibility than reaching straight for macros.

The problem I have is with your design is that "passing the logger from parent to child" is boilerplate, and doesn't make sense for logging from class methods.

As a final note, my example above is actually incorrect, @log should be @@log.

bcardiff commented 6 years ago

Let's say you create two db connection, that will create statements. If you don't allow logger per instances and passing loggers from parent you child, how are you going to allow different logging configuration between the two db connection?

Maybe I am missing something that allow that to work: like registering in a global logger configuration per instances...

It's true that I was not considering class level logging. But will do.

I play a bit on how to get rid of the method pollution from the previous proposal, add default naming context and bind that context when invoking the logger. The code might seem weird but it achieves that the memory footprint is just a reference to a Logger (8 bytes).

RX14 commented 6 years ago

@bcardiff there's absolutely nothing stopping you creating arbitrary named loggers whenever you want and passing them wherever you want. In the above example I was just expressing the common idiom. Having a registry with custom string namespaces allows you to implement whatever idiom you want in your code - even the common idiom of 1 logger instance per class becomes a single line. Your proposal requires a macro to implement that.

bcardiff commented 6 years ago

The difference without your suggested approach where named loggers are created and passed around is that, once passed, the context will be the one used at the creation of the logger and not where the logger is been called.

Maybe it's enough, but I was aiming for using a single logger instance in different code context (with potentially different log levels)

I will give it another spin to see what happen...

ezrast commented 6 years ago

@bcardiff, I dislike the coupling of modules that comes along with your forward mechanism; adding or removing a child module shouldn't force you to update the implementation of the parent.

Additionally, I'm under the impression that part of the point of centralized configuration is that it makes it easy to deal with logs coming from code you don't control. Under your paradigm, if I require shards that want to log things, then to change logging behavior I have to invoke logger= on the top-level module of every single shard (either manually or through a forwarder of my own), and hope that the shard author set up all their forwarding correctly. It's not a ton of work but it's not the most streamlined.

With centralized configuration I just set behavior once at the root level, and every logger does what I want by default unless explicitly overridden. I can still have different components log at different levels or to different places; the only difference is whether the configuration is all stored in a single structure or scattered throughout all of the program's logger instances.

RX14 commented 6 years ago

the context will be the one used at the creation of the logger and not where the logger is been called.

But that's usually what you want. It offers the flexibility of both. If you want to specify the logger namespace at the callsite just call .get(...).log, if you want it to be per-class, put in in a class var, if you want it to be per-instance, put it in an ivar, if you want the whole logger to be passed into the class just do it.

I'd love to hear the usecase you have in mind (in code) which is difficult with this approach.

bcardiff commented 6 years ago

My previous proposed design is more about how I see the architecture of an app, but I don't want to impose the design, but rather find a balance between flexibility and comfort.

Given the last feedback and the idea to keep things as decouple and flexible as possible I came to https://gist.github.com/bcardiff/55ebbf2e20b1670681e9352fbaa51be0 where Loggers are consumed by the user and have a context determined on instantiation and LoggersHandlers that actually emit the information to a stream configured by set_handler_factory.

The gist is more similar to #5921 and support the usecase @RX14 vouch for I think. There some differences thought: the logger itself has no configuration of level, that is up to the handler factory.

The framework/app is required to configure a handler_factory to create/return, for a given context, a handler that will receive log(severity, message, context) message. The logger, used by the user has a reference to the handler and the context at creation. So invocation to #log(severity, message) and #{{severity}}(message) will be able to be forwarded to the handler with the context.

A issue with this last gist is that class variables in the stdlib would be initialized before the framework defines the handler factory. This issue is also present in #5921 I think.

To solve that I see two options, the first, which I don't like is to assume the configuration of the handler factory is done via monkey patching (really šŸ‘Ž, I don't want to relay on monkey patching for this). The second is defer the invocation of the factory until the first log event. This is done in https://gist.github.com/bcardiff/ea21543eff6437c508abb48c060116ef .

I didn't spend time deciding which should be a reasonable default value and logger for the stdlib. Once we agree with the common interface we can move forward into that and implement nice logger tweaking configurations outside the stdlib.

RX14 commented 6 years ago

@bcardiff I'm fine with that interface, it looks super flexible and has basically the same interface to the logger - although it seems like it's harder to configure on the app's side than just assuming inheritance. I'd personally stick to making assumptions here, but i'll wait for others to weigh in.

ezrast commented 6 years ago

Even with lazily instantiated handlers, there's no way to gracefully change log levels after program initialization. If you want to enter "debug mode" after you've been running for a while you have to force all your modules to update their loggers somehow.

5921 doesn't have quite the same issue in practice. It's true that you can't make libraries use any handler other than the default, but using the default handler is what you're expected to do 99% of the time anyway, and you can change its levels at any time.

That does raise the question of why I even bothered letting other Handlers be instantiable, so I probably have some simplification to do there.

bcardiff commented 6 years ago

The limitation which affect also #5912 that I was referring to is that any logger used in the std and initialized as a class variable wont be able to use an overridden configuration invoked after the prelude is required.

Whether or not a log level change is supported in runtime or only on boot, in last proposal it will be up to the implementation since there is actual no implicit behavior in the std lib part.

straight-shoota commented 6 years ago

@bcardiff In the end, your proposal uses a centralized instance, the logger factory. In #5921 this is called default_handler and it's a handler instance, instead of a proc.

This setup can avoid the initialization issue when Logger#log delegates to Logger.default_handler. You don't get individual handlers for each logging source by default, so this really calls for a central registry based on namespaces to configure log levels etc. This registry might or might not be included in the stdlib implementation.

RX14 commented 6 years ago

@bcardiff my solution doesn't have that problem since .get is memoised. There's still the problem that you can log before the logger is configured, but I don't think that's solvable. At least you can configure all loggers regardless of when they're initialized, since you can always access any logger instance through a centralised interface.

ezrast commented 6 years ago

How about this? https://gist.github.com/ezrast/98bb9ed3341c0b22c660549eae049b5b

It's similar in approach to #5921, but I think simpler and more streamlined. Adapters aren't separate from Handlers anymore, SeverityFilter does what Handler used to do, and if you want a completely generic interface just pretend that SeverityFilter and IOHandler don't exist and assign a custom Handler instance using Logger.base_handler=.

straight-shoota commented 6 years ago

@ezrast While I support the general idea, this example has too much clutter. SeverityFilter is IMHO completely useless. A centralized logging system based on hierarchical namespaces needs a capable registry that allows more detailed configuration per namespace than just log level filtering. Instead of having SeverityFilter in stdlib, we should just go the full way.

It's debatable if such a registry should be included in the stdlib and used by default. If no, there shouldn't be a half-baked SeverityFilter either, but just a plain handler.

ezrast commented 6 years ago

Hmm, so we need to decide 1) what is the minimum viable feature set necessary for a logging system, and 2) do those features belong in the stdlib. I'd like to hear more about what you have in mind for (1), but my initial stance on (2) is that this is a nice battery to have included, and I don't like defining an interface without a reasonable reference implementation.

bararchy commented 6 years ago

@ezrast @straight-shoota TBH What I really want is to automagiclly get the ability for logger to be

  1. set via MyClass.log (output, severity)
  2. output: MyClass::MySubclass message
ezrast commented 6 years ago

Yeah, sorry for wandering off, I just wasn't sure where to go next since a lot of the feedback I was getting was calling for removing things that I see as table stakes.

Instantiating a new logger should be one line of code. Changing the log level should be one line of code. These are fundamental operations and most of the time are the only things users actually want to do. If my first experience with a library as a user is that I have to instantiate a factory and pass it a callback just to flip a switch from "medium" to "medium-high", I'm going to back away slowly and use puts instead.

Individual components need their own log levels. This is the only low-friction way for libraries to be able to log things without getting in the way of users. Without this built into the logging system, any library that wants to log things has to have its own interface for controlling verbosity - they can't just pollute STDOUT/STDERR without giving users some way of turning it off. There's no reason to make every shard author reinvent this wheel when we could just put it in the standard library once and be done with it.

I don't think an implementation that doesn't have these things is worth considering. This means that some logging configuration will be "global". This is 100% okay. STDOUT is global already; the abstraction we put in front of it can be global too. There's no business logic lurking in there to ruin all your encapsulation when you're not looking.

I don't want to bikeshed this too much so I'm going to hack up another PR (might be a few days before I have time). I'll keep the filtering semantics of the default handler as isolated as I can from the interface so y'all can add to or remove from them as you please. If there are any other features or use cases that haven't been addressed by the proposals so far, please mention them.

Blacksmoke16 commented 5 years ago

https://forum.crystal-lang.org/t/introducing-crylog/797

I ended up making a logging shard before noticing this issue. It's based on the popular PHP logging framework Monolog, and seems sorta similar to https://github.com/crystal-lang/crystal/issues/5874#issuecomment-381690608.

While it might be a bit much for the standard library, my pipe dream for this would the stdlib define the core concepts and implement the basics. Then other shards/people can implement extra third party handlers. This way, eventually every project would be using a common framework and the handlers/formatters/processors would be plug and play as they are developed.

Just my 2 cents.

paulcsmith commented 5 years ago

I may have missed some points, but I think the default Logger is quite flexible and can do a lot of this stuff with class_properties. This is flexible, easy to understand (no new methods or DSLs), and works right now. It is also type-safe. No symbol or string lookup of Loggers. You're calling methods all the way down

Tons of flexibility with class properties and regular methods

class MyProgram
  class_property logger : Logger = Logger.new(nil)
  class_property log_level : Logger::Severity = Logger::Severity::INFO
end

class MyProgram::SomeClass
  def do_something
     MyProgram.logger.log("some_message", level: MyProgram.log_level)
  end
end

# or if you want to allow a particular class to have a custom logger
class MyProgram::OtherClass
  class_property logger : Logger = MyProgram.logger

  def do_something
     self.class.logger("some_message")
  end
end

# Customize like so
MyProgram::OtherClass.logger = Logger.new(STDOUT)

You can also configure log levels with class properties. This is what we do on Avram (but we use Habitat for settings instead of class_properties, but the same ideas apply). It works quite well. You can configure different log levels, and loggers as desired. https://github.com/luckyframework/avram/blob/c4f0358c14efb7f7ee178180db24040298613481/src/avram.cr#L21-L23

Using IO::MultiWriter

You can also use IO::MultiWriter to write to multiple IO objects. https://crystal-lang.org/api/0.30.1/IO/MultiWriter.html so you could write to STDOUT and a tmpfil if you want. It is pretty awesome.

One thing I'd change. Use data not just string

A lot of log data can/should be teated as data and not just a raw string, and should also be formatted differently depending on the context. This is why I wrote Dexter: https://github.com/luckyframework/dexter

It is nearly identical to the built-in logger with one difference: it accepts a NamedTuple so that it can be formatted in just about any way you want. For example in development we have this PrettyLogFormatter that prints something like this:

luckyframework_2019-Feb-28

And in production, using the exact same log data, print JSON!

I highly suggest taking the approach of Dexter and using data and not just a string, e.g. logger({request_id: "123", user_id: "123}) so that it can be formatted depending on environment, and so that libraries can write data and the end-user can decide how it should be formatted. I would be happy to help with this. It should be pretty easy to do as Dexter is quite small and is nearly 100% logger compatible already. The source is tiny and most of it is reimplementing methods in the Logger with just tiny changes to support key/value data instead of strings

straight-shoota commented 5 years ago

@paulcsmith This plain approach definitely has a nice ring. But it misses an IMO essential feature and that is implicit logger aggregation. This is especially important when multiple components work together. You could surely rig class-var logger instances together, but that's going to add a lot of complexity to the individual application. And most importantly, it doesn't offer much in a way of runtime configuration. That's what I'd expect from a more sophisticated logging system.

Blacksmoke16 commented 5 years ago

(My thoughts):

I think the main problem is ATM you have to do mostly everything yourself. There isn't a clean/good way to custom how things work in diff envs without doing it yourself like you did.

Which, by the time you got all that setup, you basically implemented your own logging framework; which happens to probably not be compatible with any other person's implementation. Again, having something in the stdlib that external shards can easily build upon would be the :100: solution, i.e. a framework. The main question is what that should look like.

I agree on the including data idea, however I would make it optional, i.e. def log(message : String, context : Hash). Then you are able to define a message like user logged in, then provide context that relates to that event, like their name/id. How do you handle runtime generated logs if it accepts a named tuple? I'm assuming there is also a Hash overload or something?

From what I can gather there are now three logging implementations.

  1. https://github.com/luckyframework/dexter
  2. https://github.com/blacksmoke16/crylog
  3. https://github.com/hydecr/strange

I tend to like the approach of mine and @watzon more.

Having the concepts of multiple named loggers that can individually be configured for their purpose is a key point. In the end it comes down to how we want to implement this.

Have the stdlib be more of the framework and common features, and allow others to make PRs/shards to add fancier handlers/formatters/processors OR take a simpler approach and just add more features to the current logger system.

paulcsmith commented 5 years ago

@straight-shoota Could you expound upon implicit logger aggregation? I'm not sure I understand that.

For example in Lucky we do this

# config/logger.cr
logger = Dexter::Logger.new(STDOUT)

Avram.logger = logger
Lucky.logger = logger

Is that what you mean? Maybe I am misunderstanding.

And most importantly, it doesn't offer much in a way of runtime configuration. That's what I'd expect from a more sophisticated logging system.

What do you mean by this? Could you provide an example. I'm not sure I understand.

paulcsmith commented 5 years ago

I think the main problem is ATM you have to do mostly everything yourself. There isn't a clean/good way to custom how things work in diff envs without doing it yourself like you did.

Learning a new logging DSL is also work, and often means you are locked into a certain of handling logging. I feel I must be missing use cases because class_property is pretty easy to use and I wouldn't say is any more work than calling Crylog.configure do |registry| and retrieving it from the registry. But maybe I am missing something. I think I don't understand the use cases that provides.

The other downside of such an approach is that you lose type-safety. Using strings will mean you can get runtime errors when trying to access loggers, which IMO is against the Crystal ethos of catching bugs at compile time.

I think that if there was some kind of "framework" for logging that it may have some kind of macros, but under-the-hood it would use class_property for type-safety. For example:

class MyProgram
  # defines a `logger` class_property with a default Logger.new(nil) (or something)
  # defines a `log_level` class_property
  Logger.loggable default_log_level: :warn, default_logger: Logger.new(nil)
end

MyProgram.logger  = my_wonderful_logger

This way it's all type-safefor both setting and getting loggers.This is of course not that fleshed out, but I hope it shows the rough idea

Runtime logging of dynamic data

I agree on the including data idea, however I would make it optional, i.e. def log(message : String, context : Hash). Then you are able to define a message like user logged in, then provide context that relates to that event, like their name/id. How do you handle runtime generated logs if it accepts a named tuple? I'm assuming there is also a Hash overload or something?

In general I think enforcing that is it key/value makes it much easier to use in prod. So Dexter allows passing a string but converts it to {message: the_string}. In your example above I'd do something like: event: "user_login", user_id: 123" That way it is fully machine readable.

You could definitely do a Hash of String => String or something, but so far I (and others using Dexter) have not needed that. Usually we know the keys ahead of time but if we don't, then if we need to log a hash or something then I think the best bet is to to_json it and log it as a value, e.g. {event: "github_webhook", payload: payload.to_json}

straight-shoota commented 5 years ago

It's basically explained in the OP under Structure loggers hierarchically, with granular level controls. Your example would be a kind of aggregation, every logger context feeds into a single logger instance. But with different components having different logger needs, the goal is to have separate rules for individual logger contexts and be able to feed them upwards in the hierarchy.

As an example for runtime configuration, this totally made up config format specified (at runtime) that all log messages above warn should be printed to stdout and only from the namespaces Foo and Bar::Foo all log messages above debug.

logging:
  basic:
    handler: stdout
    level: warn
  debug_foo:
    handler: stdout
    filter:
    - Foo
    - Bar::Foo
    level: debug
jhass commented 5 years ago

I feel like this should be a shard, proliferate as one at least. I like having a KISS logger in stdlib, rather something complex I don't need 90% of the time.

paulcsmith commented 5 years ago

@jhass I'm kind of with you. I would not want most of this (but I may be convinced otherwise!), but I would love if the regular logger accepted data as I think that is a much better way to go. And can still work with strings as I mentioned above :D

@straight-shoota Oh I see. I was trying to avoid runtime config like that since it is so easy to have typos and bugs when it is using regular methods. But I can also see the use-case for something like that! Even if it could be made type-safe, I don't think it is necessarily true that things use namespaces in a way that make them "automatically" configurable like that. I feel that the lib authors would need to provide hooks for specifying what log level is used for certain messages, especially as sometimes different methods in the same class log totally different messages. For example, in Avram we have a log level for logging queries as well as logging invalid saves. Those both behave differently and have different log levels, but configuring by class wouldn't make sense. You kind of have to expose that option as a library author.

One thing I think might make things easier is to have a top-level logger on Crystal that shards can use by default, and can then allow customizing things as necessary. For example:

Crystal.logger # By default is Logger.new(nil)

# But we'll set it to log to STDOUT
Crystal.logger = Logger.new(STDOUT, level: Logger::DEBUG)

if dev?
  Crystal.logger.level = Logger::DEBUG
end

# Libs can then rely on Crystal.logger without requiring people to pass the logger in!
class MyProgram
  def do_something
    Crystal.logger.info("Did something wonderful")
  end
end

# This class does something special. We may want to make it so people can configure it differently. 
class Something Special
  class_property logger : Logger = Crystal.logger
  class_property log_level : Logger::Severity::INFO

  def something_dangerous
    self.class.logger.log(self.class.log_level, "Something crazy happened")
  end
end

# I don't care about this message so I'll set it to info so I don't see it in prod, but want to know about it in dev
SomethingSpecial.log_level = Logger::Severity::DEBUG

What if we really don't want to use class_property?

Maybe we have a very simple module that makes it easy to edit how a class logs (similar to what OP posted)

class Something Special
  # Would create
  # class_property logger : Logger = Crystal.logger
  # class_property log_level : Logger::Severity::INFO
  # Also creates `logger` and `log_level` instance methods  for easier access
  Logger.make_loggable default_log_level: Logger::Secerity::INFO

  def something_dangerous
    logger.log(log_level, "Something crazy happened")
  end
end

I kind of like this approach but would be curious to hear thoughts on it

paulcsmith commented 5 years ago

But also I wonder if there are too many little things here so the discussion is going a bit all over. Maybe we can split this into a few discussions

I think this may be beneficial to split as these can all be handle separately (for the most part)

For example I think a Crystal.logger that shards can use by default would be awesome. That way we don't need to pass that in to every single shard. That's also something that could be added before all that other stuff is decided, if we can agree upon it

straight-shoota commented 5 years ago

@jhass Totally! It has already been established in the course of this discussion, that the implementation should not be in stdlib. But the idea is to define an interface that every shard can use. Stdlib provides the framework and a basic logger implementation which should suffice for most use cases. But when needed, it can be backed by a more complex logging system that hooks into the framework.

So for stdlib it means little more than the current implementation, just a little bit API refactoring to enable it for more advanced features. The key property is, that the framework is defined in stdlib and for a shard using it, it should be completely transparent.

@paulcsmith

For example I think a Crystal.logger that shards can use by default would be awesome. That way we don't need to pass that in to every single shard. That's also something that could be added before all that other stuff is decided, if we can agree upon it

That's generally the idea behind this issue. But: A centralized logger instance needs to have some kind of context-awareness. Consider two components/classes/shards happen to send log messages with the same content. When you use it like Crystal.logger.info("Did something wonderful"), there is no way to tell where the log message originated from. This way it's more clearer : Crystal.logger.info("Did something wonderful", namespace: "MyProgram") But calling it like this would add more verbosity.

So this discussion has been about ideas how to improve this. For example, calling it as Crystal.logger("MyProgram").info("Did something wonderful"). And the call to Crystal.logger can be cached in local scope to just call it logger.info("Did something wonderful") and it will assign the appropriate context information.

Blacksmoke16 commented 5 years ago

Some more of my thoughts; with some examples from Crylog on how I see this best working, also see the core concepts.

Learning a new logging DSL is also work, and often means you are locked into a certain of handling logging

This is true for anything. The goal would be to have it well documented and easy enough where it's not really an issue. Assuming whichever implementation framework shard is "chosen", other libs/frameworks would be updated to use that. Having a singular flexible logging framework that can be used everywhere would be the best case scenario. Mainly since then other handlers/processors/formatters could be implemented, either in the actual lib, or as external shards.

But maybe I am missing something. I think I don't understand the use cases that provides.

It mainly allows for all the logger information to be defined in a single place. You can also control what processors/handlers/formatters each named logger has conditionally.

One feature that is kinda nice is how each logger is named and independent. You have total control over how each aspect of each logger works.; thus you can do stuff like [2019-05-19T00:45:05.558218000Z] worker.DEBUG: Hello from the worker!. In this case the async worker would have a diff logger with diff requirements, which automatically includes where that message came from.

I feel that the lib authors would need to provide hooks for specifying what log level is used for certain messages

what log level is used for certain messages, especially as sometimes different methods in the same class log totally different messages

That would be one of the features of the framework. Each "handler" would be responsible for controlling if it should log a given message. I can give some examples from Crylog. See handles? and bubble.

  • How to handle formatting?

Provide a sane default but allow each handler to define a custom formatter.

  • How to handle log messages (should it be key/value data?)

It should be up to the user. Have them able to log only key/value pairs, string and key/value pairs, or only a string. It should also be the responsibility for the handler/formatter in order to transform the message object into the format the handler expects. I.e. you could have an Elasticsearch (ES) handler with an ES formatter to transform the message into the format ES requires.

  • How do we configure log_level/logger (registry? class_property? YAML?)

I would love to have the option of a YAML based configuration. However I haven't thought a lot on how that would work.

paulcsmith commented 5 years ago

It mainly allows for all the logger information to be defined in a single place. You can also control what processors/handlers/formatters each named logger has conditionally.

I think where I'm confused is that you don't need a framework to have named loggers or conditional logging. You can use an instance variable and create a logger, then assign it

file_logger = Logger.new(io: File.new("tmp/test.log", mode: "w"),)
standard_logger = Logger.new(io: STDOUT)

This is also type-safe (no accessing registries with strings). No new DSL, and infinitely customizable because you can use conditionals or whatever else.

A real life example is here: https://github.com/luckyframework/lucky_cli/blob/master/src/web_app_skeleton/config/logger.cr

Lots of conditionals, different loggers, super customizable, type-safe, etc and no new code. I wonder if part of this is we just need guides and examples on how to do it?

Creating multiple named loggers can be done right now as shown in the example so I must be missing something else.

That would be one of the features of the framework. Each "handler" would be responsible for controlling if it should log a given message. I can give some examples from Crylog. See handles? and bubble.

Could you explain this more? How is this different than this:

quiet_logger = Logger.new(io: STDOUT, level: :info)
loud_logger = Logger.new(io: STDOUT, level: :debug)

WonkyLibrary.logger = loud_logger # tell me everything!
TrustworthyLibrary.logger = quiet_logger # just the info and errors please

As a separate issue, configuring the log_level that the library uses could be customized using some kind of lib like yours (or like I suggested, class_property which is already there and documented). Though could be wrapped in a simple macro like:

https://github.com/crystal-lang/crystal/issues/5874#issuecomment-530138093

Overall I think the goal should be as minimal code/DSL necessary to learn and use. If we can use regular methods and local vars, why not!

Blacksmoke16 commented 5 years ago

I think where I'm confused is that you don't need a framework to have named loggers or conditional logging.

No you don't, but it does help in wiring everything together. Especially when having additional features other than a simple logger class.

Could you explain this more? How is this different than this:

Sure. In your example each named logger is mapped 1:1 to a logger instance. In Crylog a named logger has an array of "handlers", which is simply something that sends the logged message to somewhere/something.

When a message is logged the first handler "handles" that message. If bubble is true, the same message flows to the next handler, which can either handle or ignore the message. This allows a single named logger to send each logged message to multiple locations.

For example you could have a handler that logs the same message to STDOUT, a log file, and optionally sends an email if the severity is high enough. IO::MultiWriter could be used for most IO related things, but alas, sending an email, or pushing a message into ES/sentry is not IO.

Since each named logger instance consists of indenepent handlers, they can each be "built" from all your "blocks" (i.e. handlers/processors/formaters) to best handle the use cases that that specific named logger will be used in. For example, doing HTTP requests, or handling async messages. This inherently gives you context on where a message came from.

Similarly, since each Crylog logger is not directly a Logger instance, it can have additional features, such as processors. A processor is a something that can be defined that will add "metadata" to every logged message. Take this snippet from Athena for example.

# user_processor.cr
struct UserProcessor < Crylog::Processors::LogProcessor
  def call(message : Crylog::Message) : Nil
    user_storage = Athena::DI.get_container.get("user_storage").as(UserStorage)

    # Return early if a message was logged in a public endpoint there won't be a user in storage
    return unless user = user_storage.user?

    # Add the current user's id to all log messages
    message.extra["user_id"] = user.id
  end
end

# other_file.cr
Crylog.configure do |registry|
  registry.register "main" do |logger|
    ...
    logger.processors = [UserProcessor.new]
    ...
  end
end

Now imagine that the UserStorage class contains the current "logged in" user in your API/app. This processor would automatically add extra context to every logged message that passes thru the named logger.

Crylog.logger.info "User logged in"

[2019-05-27T22:40:03.445515000Z] main.INFO: User logged in {"user_id":1}

Another scenario would having a processor specific to your http request logger that can add like the body/headers/host of every failed request.

bcardiff commented 4 years ago

Logging vNext

The following is the logging module Log we've been designing recently. The implementation is not done yet, but we've been iterating on different scenarios and requirements to reach the current proposal.

Generating logs

The are a couple of severity levels it can be used to emit logs: debug, verbose,info, warn, error, fatal.

Usually you will type Log.[level] { "a message" }

Log.info { "Hi there!" }

Note: Using the result of the block for the message will allow the runtime to ignore possible conversions, string interpolations to build the message if it is not going be emitted.

The Log is a new top-level constant in the std-lib (Logger will be deprecated and removed).

When generating logs you will use a Log instance. Each Log instance is bound to a source or path. Log.get(source : String) : Log will allow you get a new instance. The top-level Log acts as an instance of an empty string source Log.

When defining a new shard or module you will define a Log constant for that namespace.

module DB
  Log = ::Log.get("db")
end

Thanks to the relative resolution of constants, using Log.info { ... } inside DB types will resolve to the one previously defined and not the top-level one.

If you move code around during development, you will not need to deal with logger/@logger/@@logger substitutions.

If you need nested sources, you can use the parent namespace Log and avoid hard-coding the source name.

class DB::Pool
  Log = DB::Log.get("pool")
end

Or, if you want to hard-code it.

class DB::Pool
  Log = ::Log.get("db.pool")
end

The result of Log.get(source : String) will return the same object instance for the same source. This will allow overriding the severity level of the log during specific moments programmatically.

Log.get("db").level = :verbose # or Log::Severity::Verbose
do_something_with_db
Log.get("db").level = :none # Log::Severity::None disables all levels

# DB::Log can be used instead of Log.get("db")

Note: Setting the level is specific to a source and not it's nested ones. The previous example doest not affect the db.pool source.

Note: Note that setting the level of a Log instance will affect all other Fibers that are concurrently using it.

When building a program it will be possible to ignore completely as part of the compilation the debug calls. So there is no penalty even to check if those logs should be emitted. The debug methods can be replaced by noop during compile-time.

Writing backends

The std-lib will come with a basic backend to write to stdio but you can define your own backend by inheriting Log::Backend.

@[Log::RegisterBackend("custom")]
class MyCustomBackend < Log::Backend
  def initialize(config : Log::Backend::Config)
  end

  def write(e : Log::Entry)
  end

  def close
  end
end

Whether the write operation in sync or async it will be up to the implementation. There might be some base modules to implement easily an async backends (similar to IO::Buffered).

Configuration

There will be a way to configure a series of backends and use them to write the logs of certain sources. When requesting the Log for a source, the runtime will load the configuration and bind the Log instance with the corresponding backends.

The std-lib will read a crystal-log.yml from the current directory. Or setting CRYSTAL_LOG_CONFIG you will change the file location.

A sample of the configuration file with potential options are:

version: 1

backends:
  stdio:
    type: stdio # -> Search for @[Log::RegisterBackend("stdio")]
    stderr_level: warn
    formatter: Log::DefaultFormatter
    format: # pattern/template 
    color: true

  file: # if no type property is given the name will be used as such. So @[Log::RegisterBackend("file")]
    file_pattern: app.%d{yyyy-MM-dd}.log
    location: /var/logs/
    max_file_size: 10mb
    formatter: Log::DefaultFormatter
    format: # pattern/template 

  elastic:
    url: localhost:9200
    index: app-%YYYY-%mm-%dd

bind:
  - sources: * # other options: āŠ” (empty string), db, db.*
    level: WARN
    backends: stdio, db

  - sources:
      - db.query.*
      - http.client.*
    level: INFO
    backends: 
      - stdio
      - db

The default configuration when no config file is found is:

version: 1

bind:
  - sources: # (empty)
    level: INFO
    backends: stdio

You can set CRYSTAL_LOG_LEVEL, CRYSTAL_LOG_SOURCES to change the default configuration.

Implicit Context

The context information will live in the fiber. A new fiber will start with an empty context. Context are immutable but can be extended with convenient methods. The context are immutable so the backend can keep a reference to the context that was used when generating the Log::Entry.

The current context can be read from Log.context.

You can change the current context with clear and set methods.

Log.context.clear # => clears current context
Log.context.set request_id: 34, user_id: "jdoe@example.org" # => extends the current context

When spawning Fibers you can inherit the context explicitly:

c = Log.context # current context
spawn do
  Log.context = c
  # ...
end

The context.using can be used with a block to restore to the previous context.

Log.context[:request_id] # => 23

Log.context.using do 

  Log.context.set request_id: 57689
  Log.context[:request_id] # => 57689

end # previous context will be restored at this point

Log.context[:request_id] # => 23

Under the hood the context.using version uses .context/.context=. The context.using block example is equivalent to:

Log.context[:request_id] # => 23

previous = Log.context
begin

  Log.context.set request_id: 57689
  Log.context[:request_id] # => 57689

ensure
  Log.context = previous
end

Log.context[:request_id] # => 23

Other types

Disclaimer: this is more the structure description rather than the actual definition.

alias Bakend::ConfigValue = Bool | Int32 | Int64 | String | Array(ConfigValue)
alias Bakend::Config = Hash(String, ConfigValue)

alias ContextValue = Bool | Int32 | Int64 | Float32 | Float64 | String | Time | Hash(Symbol, ContextValue)
alias Context = Hash(String, ContextValue)

record Entry, source : String, severity : Severity, timestamp : Time, message : String, context : Context
Blacksmoke16 commented 4 years ago

I think this'll definitely be an improvement. However, I have some questions.

  1. Will interfaces (modules) be defined for the types, like Log::Backend and formatters, etc? This would make it easier to expand/offer custom implementations while still being compatible everywhere.
  2. Has any thought went into including processors as a third concept? It can be helpful, especially when paired with DI to include context about the current user/request etc to every logged entry.
  3. How would Log#get work when working with multiple fibers, such as HTTP::Server. Would each request get its own instance, or would they be shared? It sounds like the context would be fiber specific which is good.
  4. Related to 3., are logger instances instantiated once at program start or?
waj commented 4 years ago

@Blacksmoke16:

  1. I don't see why not. We have in mind some base classes that could be used to create new backends, but a module to start from scratch might be defined as well
  2. What advantage have processors instead of just settings the injected context? Processors would be called everywhere, even for context where is no values to be set.
  3. Log instances will be shared between threads. Context belongs to the current fiber.
  4. There is no implementation yet, but with the suggested idioms to create Log instances, they will be created on program start.
Blacksmoke16 commented 4 years ago

What advantage have processors instead of just settings the injected context? Processors would be called everywhere, even for context where is no values to be set.

Actually let me back up and ask another question. Since the context belongs to the current fiber, that would mean the context is fiber wide and not tied to a specific logged message.

For example, say I did Log.context.set user_id: user.id within my authorization class, then everything logged within the context of that fiber would have a reference to that user_id context correct?

If yes, then yea, a processor isn't really necessary.

waj commented 4 years ago

For example, say I did Log.context.set user_id: user.id within my authorization class, then everything logged within the context of that fiber would have a reference to that user_id context correct?

Exactly! The context is actually immutable and a new instance is created every time you set a different one. This is to allow backends to flush log entries asynchronously without having to copy the context every time. So at the end, the log entries have just a reference to the current context at the moment of calling the log method.