Open kocolosk opened 6 years ago
Some comments after musing on this topic for a day:
lager
from the codebase a few years ago. Why? Was it simply that we thought it was a lot of unnecessary code?report
which is a list of atoms and 2-tuples. A report
is not necessarily JSON-compatible because each value is an arbitrary term.Lazy evaluation could be as simple as wrapping every log invocation in a fun() -> ... end
and only invoking the function once we've checked the log level. That should be efficient and cheap, but probably means reverting to macros instead of direct M:F/A calls ...
What if we would change our logging statements to something like:
couch_log:Level(FormatWhichSupportsBindings, MetaData, Bindings).
Metadata would include data about location of the call and some context such as:
Here is one example which illustrates the idea:
couch_log:Level("Something {db_name} something else {view}",
[{line, ?LINE}, {module, ?MODULE}, {user, #user_ctx.name}],
[{
{db_name, DbName},
{view, View}
}]).
We can define macro to include ?LINE and ?MODULE for convenience if we want to.
Since all of the arguments are key-value pairs we can use any serialization format (depending on logging infrastructure used):
There might be a need to define expected format for every argument. For example in which format we would send dates? We need to think about it we we design the FormatWhichSupportsBindings language or choosing the library.
The metadata inclusion is inspired by Elixir logger. Which has following fields: https://hexdocs.pm/logger/Logger.html#module-metadata
We can include {commit, ?COUCHDB_GIT_SHA}
as well.
We also need to include timestamp in metadata or pass it as a separate argument. In order to have exact timestamps which correspond to the time when log function is executed rather then when we format the message in the logging backend.
@iilyak have you looked at the Erlang 21 logger library? It seems like a decent target.
API for Erlang 21 logger looks good.
?LOG_ERROR(FunOrFormat,Args[,Metadata])
-define(LOG_ERROR(A,B,C),?DO_LOG(error,[A,B,C])).
-define(DO_LOG(Level,Args),
case logger:allow(Level,?MODULE) of
true ->
apply(logger,macro_log,[?LOCATION,Level|Args]);
false ->
ok
end).
-endif.
LOG_ERROR macro injects additional info to provided Metadata map. The injected info:
-define(LOCATION,#{mfa=>{?MODULE,?FUNCTION_NAME,?FUNCTION_ARITY},
line=>?LINE,
file=>?FILE}).
As you can see the logging macro uses case
to find out if the event needs to be logged.
Formatter:format(Log,FormatterConfig)
(see logger_h_common.erl
)case
to check if it needs to be logged (ets:lookup)logger_backend.erl
via logger.erl
logger_backend.erl
does the following:
logger_config:get(Tid,primary)
(ets:lookup)logger_config:get(Tid,Id,Level)
(ets:lookup)Module:log(Log1,Config1)
Default handlers are implemented as gen_servers.
Default handlers (logger_disk_log_h
and logger_std_h
) has overload protection. Which is implemented here.
If the handler process is getting overloaded, the log event will be send synchronously (to the handler process) instead of asynchronous (slows down the logging tempo of a process doing lots of logging. If the handler is choked, drop mode is set and no event will be sent (to the handler process).
The logger_h_common
provides helper function for gen_server based handlers to flash all log events from the message queue in case of overload. It also provides functions to detect overload case.
The Logger library has good API and useful features. Implementation is robust and should provide good performance. The only place which could be optimized further (based on one hour review) is configuration module. Since it is implemented using ets which is the second fastest way (the fastest way is generated module).
@kocolosk wrote:
- We exhumed lager from the codebase a few years ago. Why? Was it simply that we thought it was a lot of unnecessary code?
- Erlang 21 adds a new logger inspired by lager:..
iirc we switched from lager because 1) it was too coupled with couch codebase and we wanted ability to change backends (apache/couchdb-couch-log#4) 2) we didn't want lager's overload protection, not with how it handle it with just quietly dropping the messages 3) and it was cumbersome to deal with all together, with its use of parse_transform
and none-intuitive buffers and handlers
I like the idea of structured logging, but I'm not keen on proposed here change to couch_log:Level(FormatWhichSupportsBindings, MetaData, Bindings)
. I don't see benefits in introducing two separate bins to hold attributes and then also enforce change on both formatting template and arguments' switch from list to proplist.
I don't know how Elixir logger works, but personally I like how logrus went about idea of structured logging. They keep all meta or "fields" separate from a message and encourage to simplify the message itself, so it wouldn't need to include meta info. Quoting: "We've found this API forces you to think about logging in a way that produces much more useful logging messages.". Sure one still can use sprintf
if necessary.
There are also idea of configurable implicit default fields, akin of mentioned ?LINE
and ?MODULE
and automatic treatment of level
, time
and msg
as just another fields.
This is how I'd go about adding structured logging to couch_log
:
#log_entry
, but a map or a proplist. Depending on a formatter the output would be the same log message string as we have now, JSON object or a message + key=value pairs (this called TextFormatter
in logrus
).couch_log:Level(Format, Args, Meta)
to allow to pass additional fields. To emphasize - formatter doesn't modify msg
- the message is a message, not a template.[log]
section with a default to what we have now. Or default can be dictated by a writer module, I'm thinking about couch_log_writer_syslog
caseA bonus section:
couch_db.hrl
, so they available almost everywhere.
+-define(l(F, A), couch_log:info("~p:~b - " ++ F, [?MODULE,?LINE] ++ A)).
+-define(r2kv(Rec, Name),
lists:all/2
to run over meta.@eiri Add function couch_log:Level(Format, Args, Meta) to allow to pass additional fields. To emphasize - formatter doesn't modify msg - the message is a message, not a template.
Can you clarify how you are planing to use Args? Is it the same as we currently do:
couch_log:debug("Something ~p~n", [Term], Meta).
@iilyak yes, exactly. If we think about message as just another attribute in log's meta, then it's not formatter's job to modify attributes, but to format the whole event to a string, JSON, key=vals or whatnot.
For lazy evaluation and reduce cost of io:format
call we can match Args
with empty list and\or go with @kocolosk idea of wrapping that Fmt + Args
into fun and call that after should_log
evaluation.
I am trying to understand what JSON formatter would do with Args
and how we can access them in splunk (or any other system)?
You seems to miss my point, so I'll try again.
Log message is just another attribute in log data, nothing more. It's not some special thing that every formatter have to know how to modify, that modification would happened before passing data to formatter, around the same step where we'll have time, level and the rest of auto-attributes injected. Formatter then only responsible for representation of log message.
If you want some attribute been concatenated into message it need to be presented in both Args
and Meta
, but the point of structured logging is that you don't need to do that to get informative logging.
For example for text formatting instead of log message "Compaction started by pid <0.11.0> for shard \"shards/00-ff/mystic.123456\"" you'll get "Compaction started. pid=<0.11.0> shard=\"shards/00-ff/mystic.123456\"" - it's still readable and actually better greppable and you don't need to deal with io:format
's Fmt
Compaction started by ~p for shard ~s
. Then as JSON it'll be just {"msg": "Compaction started ", "pid": "<0.11.0>", "shard": "shards/00-ff/mystic.123456"}
I hope that makes sense.
Thank you @eiri. In order to make it work we would need to forbid ~
in the Msg
. Either technically or enforce it on review stage. This means that Format
needs to renamed. Maybe proposed API would be better described as follows:
couch_log:Level(MessageTag, Args, Meta).
@eiri but the point of structured logging is that you don't need to do that to get informative logging.
Only if we agree to pass key-value list in Args.
I'm seems to be very bad at explaining :(
No, we do not forbid or change anything, quite the opposite! couch_log:Level/2
stays exactly as it is, we just have a a step at which we are creating proplist with (this is pseudo code!) [{level, Level}, {msg: io_lib:format(Fmt, Args)}, {time, os:now()}... ]
and then passing this proplist to configured formatter.
Default formatter then essentially is this step extracted and it gives back to a writer the same string as we have now to write. JSON formatter serialize the proplist and gives writer JSON to write. key=val formatter creates a string similar to my example above and passes it to writer. And so on.
We also adding a new function couch_log:Level(MessageTag, Args, Meta)
which is different in just one aspect - it has Meta
proplist to be merged into our "auto" proplist.
With this we have a transparent migration, turning structured logging "on" is just a question of configuration and we can update existing code moving attributes from log message into Meta at our pace. Where with format forbidding or semantics changing approach we are facing a massive update of all log calls across the whole codebase which doesn't sounds very appealing.
I think I understand you proposal. There is one contradiction in your description which prevents me from understanding it 100%. You've said that we would pass [.... {msg: io_lib(Fmt, Args), ..}
to configured formatter. Then you mention that default formatter would do the step we currently do.
Which also includes the call to io_lib(Fmt, Args)
. Also I think one of the goals of your proposal is to avoid modifying all places where we call logger. I don't think it is a big deal. We always can write scripts to update all callers in semiautomatic manner.
Back to my proposal (combined with ideas from erlang 21 Logger). I want to cover upgrade case. We would allow both to coexists for some time.
Currently we have couch_log:Level(Format, Args)
we would replace it with LOG_ERROR2(Format, Args)
. We can use search and replace to make this change. Updated log statements would use LOG_ERROR
macro instead. The macro would be defined as follows:
-define(LOG_ERROR2(Msg, Args),?DO_LOG(error, [Msg, couch_log:to_kv(Args), [{format_vsn, 1}]])).
-define(LOG_ERROR(Msg, Args, Meta),?DO_LOG(error, [Msg, Args, Meta])).
-define(DO_LOG(Level, Args),
case couch_log_util:should_log(Level) of
true ->
apply(couch_log, macro_log, [?LOCATION, Level | Args]);
false ->
ok
end).
-endif.
-define(LOCATION, [
{module, ?MODULE},
{function_name, ?FUNCTION_NAME},
{frunction_arity, ?FUNCTION_ARITY},
{line, ?LINE},
{file, ?FILE},
{log_event_pid, self()}
])
couch_log:to_kv
would receive the list of values and return [{"arg1", value1}, {"arg2", value2}]
.
Somewhere in the formatter we can deal with legacy format (we can detect it by checking format_vsn == 1
) and call io_lib:format.
The updated calls would look like
?LOG_ERROR("Compaction started", [
{compactor_pid, Pid},
{shard, Shard}
], [{user, #user_ctx.name}]).
@eiri I don't see benefits in introducing two separate bins to hold attributes
Two bins are useful to avoid name clashes. Imagine that you want to have argument named with the same key used in meta-information.
Log events in JSON for non updated callers would look like:
{
"msg": "Compaction started by '<0.11.0>' for shard 'shards/00-ff/mystic.123456'",
"args": {
"arg1": "<0.11.0>",
"arg2", "shards/00-ff/mystic.123456'"
},
"meta": {
"module": "couch_bla_bla",
"function_name": "compact_shard",
"frunction_arity": 4,
"line": 1234,
"file": "src/couch/ebin/couch_bla_bla.beam",
"log_event_pid": "<1.2.3>"
}
}
@iilyak : There is one contradiction in your description which prevents me from understanding it 100%. ...
There are no contradiction really, it just serialization method of text formatter happened to be sprints
(or rather io_lib:format
, because erlang). And it is what we are doing right now: we are formatting message here first and then in a writer we are packing rest of log entry attributes.
Also I think one of the goals of your proposal is to avoid modifying all places where we call logger.
Well... yes? If we can get structured logging with minimal changes, no migration and without re-educating everyone on how formatting in logs works then I believe it's a Good Thing.
Two bins are useful to avoid name clashes.
Ah, but you have name clashes only because you are introducing two data bins. 😄
Here is what I don't like about it:
meta.pid = <0.11.0>
or for args.pid = <0.11.0>
because programmer felt like packing pid into args instead of passing it to meta.pid
info can end up both in args
and in meta
. Not to mention that it is also written in the message.sprint
format and enforcing everyone to learn it.I guess the goal of your proposal is to have both structured and "informative" logging at the same time? I agree that it's tempting to make structuring kind of invisible, yet in my opinion the change is rather steep. Also that automatic couch_log:to_kv/1
conversion doesn't look too helpful to me, it's not really a structure if you have a lot of attributes args.arg1
, args.arg2
meaning different things.
A side note on those macros and semi-automatic updates - I think we can use parse_transform
to propagate auto-meta, iirc lager was doing that under the hood.
@eiri I guess the goal of your proposal is to have both structured and "informative" logging at the same time?
No. The goal is to pass args as is in JSON (I'll be talking about JSON but it can be any format). In fact the more I think about it the more I am convinced that the message in log shouldn't have any embedded data at all. I wanted to see "Shard compaction: "
instead of "Compacting shards/12344/3452342/something by <0.1.2>"
in the textual part of the message.
Also that automatic couch_log:to_kv/1 conversion doesn't look too helpful to me, it's not really a structure if you have a lot of attributes args.arg1, args.arg2 meaning different things.
The couch_log:to_kv/1
is only a temporary measure to support old style (not yet converted) log statements. I can see two ways of supporting the old style:
I think the first approach is better for the following reasons:
- It makes search queries more complicated. With two data bins you have to guess if you need search for meta.pid = <0.11.0> or for args.pid = <0.11.0> because programmer felt like packing pid into args instead of passing it to meta.
I agree it could be a problem
- It duplicates data. The same pid info can end up both in args and in meta. Not to mention that it is also written in the message.
Agree that same pid can end up in both. I think we should discourage putting any info in the message as I mentioned earlier.
- It encourages to continue to pack meta into message instead of promoting thinking about logs as structured data.
On the contrary. If we make it clear that we are not doing io_lib format at all it would encourage people to remove any printf style formats from their messages. As you can see from the example (I copied the example here from my previous comment) there are no formats in the message:
?LOG_ERROR("Compaction started", [
{compactor_pid, Pid},
{shard, Shard}
], [{user, #user_ctx.name}]).
- It require inventing new sprint format and enforcing everyone to learn it.
This is changed see example with ?LOG_ERROR("Compaction started", [
above. I forgot to mention that my previous comment is iteration of my previous proposal with some elements from yours. As one of ways to discourage putting info into the message we
could check if the message contain any ~
for new style invocations and abort the build with error. Unfortunately it is impossible to implement in erlang using macro our only option here is parse_transforms. Which I would prefer to avoid.
A side note on those macros and semi-automatic updates - I think we can use parse_transform to propagate auto-meta, iirc lager was doing that under the hood.
We could but in this case I would propose to bump supported erlang to at least 18.0 so we can use merl. Writing parse_transfrom using erl_syntax is ugly and tedious.
Overal there is not a big difference between our proposals.
couch_log:error("Compaction started for ~p~n", ["shards/...."], [{pid, <1.2.3>}, {user, "foo"}])` ->
{
msg: "Compaction started shards/...\n",
pid: "<1.2.3>",
user: "foo"
}
?LOG_ERROR("Compaction started", [{shard, Shard}, {compactor_pid, <1.2.3>}], [{user, "foo"}]) ->
{
msg: "Compaction started",
args: {
compactor_pid, "<1.2.3>"
},
"meta": {
user: "foo"
}
}
As you can see the differences are:
eiri | iilyak | |
---|---|---|
Presence of format markers | * | |
Embed data into message | * | |
Calls to io_lib:format | * | maybe |
Args available in JSON | * | |
Matadata available in JSON | * | * |
Ease of converting old calls | * | * |
Have to think where to put args | * |
I agree with you that the biggest problem with my proposal is that the developer would have to think where to put args either into Args or into Meta. I would be OK with merging two bins if we would find a way to detect keys collision. In this case the only differences between proposals would be:
No. The goal is to pass args as is in JSON (I'll be talking about JSON but it can be any format). In fact the more I think about it the more I am convinced that the message in log shouldn't have any embedded data at all.
Then you lost me again. If you are not using args to format message, then why are you insisting on spreading data across two distinct name spaces? Your first answer was "Two bins are useful to avoid name clashes" and second: "The goal is to pass args as is in JSON". Those are not "why", those are consequences of that decision. Both args and meta are data, list of key-values or JSON as you call it - what benefits are there from arbitrary splitting it?
As one of ways to discourage putting info into the message we could check if the message contain any ~ for new style invocations and abort the build with error.
There are quite a difference between discouraging and crippled functionality. I don't think breaking basic expectation of how logging works will win it a lot of fans.
Overall there is not a big difference between our proposals.
Yes, except mine not requiring an ideological revolution and northern river reversal effort to implement. I know you said you don't think it's a big deal - well, I do. Massive re-writing of code base and enforcing people to re-learn things happens, for example PSE, but we can avoid it here and get exactly same result - structured logs.
In this case the only differences between proposals would be:
- key-value list instead of plain list for Args
Actually the real difference here is that you think Args
is a part of structured data and I see it as a part of data-point "message".
It's about user-friendliness really. Purest form of structured log is couch_log([{message, "msg"}, {level, "info"}, {key, val},..}])
. Since "message" and "level" two required attributes, instead of writing:
Level = "info",
Msg = io_lib:format(Fmt, Args),
couch_log([{message, Msg}, {level, Level},..]).
We can write this as couch_log:Level(Fmt, Args, [{key, val},..])
This is easier to write and find when scanning code, but it's just a syntactic sugar.
Yes, except mine not requiring an ideological revolution and northern river reversal effort to implement. I know you said you don't think it's a big deal - well, I do.
The implementation would be less than 300 lines (most likely less than 50).
Actually the real difference here is that you think Args is a part of structured data and I see it as a part of data-point "message".
It's about user-friendliness really. Purest form of structured log is couch_log([{message, "msg"}, {level, "info"}, {key, val},..}]). Since "message" and "level" two required attributes, instead of writing:
What are the benefits of doing Msg = io_lib:format(Fmt, Args)
?
The implementation would be less than 300 lines (most likely less than 50).
I'm not talking about implementation, I'm talking about scale of change - all the code you are suggesting to modify and all public API you are suggesting to redefine.
What are the benefits of doing Msg = io_lib:format(Fmt, Args)?
Not benefits of doing, but benefits of allowing to do, in my opinion software should be helpful to programmer, not limiting and prescriptive.
I'm not talking about implementation, I'm talking about scale of change - all the code you are suggesting to modify and all public API you are suggesting to redefine.
I really like the idea to use macro, because this way we would be able to embed additional meta data about location easily. However if it is one of the most important concerns for you we always can implement the desired functionality via parse_transform.
What are the benefits of doing Msg = io_lib:format(Fmt, Args)?
Not benefits of doing, but benefits of allowing to do, in my opinion software should be helpful to programmer, not limiting and prescriptive.
In my opinion if we would keep the current approach (when we call io_lib:format(Fmt, Args)
behind the scene) we as developers would continue to embed data into the message.
~p
)couch_log:Level(Message, Args, Meta)
interface using parse_transform and merl. io_lib:format
call would be done in the formatters (which need it).couch_log:error("Compaction started for ~p~n", ["shards/...."])
line formatter output
Compaction started for shards/..... [pid=<0.11.0>, line=123, file=couch_db_updater.erl, arg_1="shards/..."]
json formatter output
{
"message": "Compaction started for ~p~n",
"line": 123,
"file": "couch_db_updater.erl",
"pid": "<0.11.0>",
"arg_1": "shards/..."
}
couch_log:error("Compaction started", [{shard, "shards/..."}], [{user, "foo"}]).
line formatter output
Compaction started. [pid=<0.11.0>, line=123, file=couch_db_updater.erl, shard="shards/...", user="foo"]
json formatter output
{
"message": "Compaction started for ~p~n",
"line": 123,
"file": "couch_db_updater.erl",
"pid": "<0.11.0>",
"shard": "shards/...",
"user": "foo"
}
I think a move to structured logging will be very helpful. As @eiri points out, at least in my experience it makes one think more deeply about useful log ... content I'll call it 😄
FWIW I have tended to think of the "message" as the human readable bit that will show up by default in tools like Kibana or Splunk. In this thinking should therefore be a useful one line to scan. So for the compaction example, I would likely choose to output the message of "Compaction started for shards/....." rather than just "Compaction started.". I think this is my only main insight that I haven't seen in the awesome discussion on this ticket so far :)
While I think @eiri is right that message is "just another field", most solutions appear to output a message
field by default in the UI and have disclosure arrows to view other things, so I do think that these messages need to be useful for a human reading the logs.
For the other fields, I've tended to think about what would be useful from a searching perspective. So "action: compaction_start" might be a useful one that these examples miss out. Broadly: what should I include that means that a search doesn't have to include free-text searches of the message field. In practice this usually means duplicated information between the message and the fields.
For the API specifics:
I like something like:
couch_log:error(FormatString, FormatArgs, ExtraFields)
which ends up like:
couch_log:error("Compaction started for ~p", [{shard, "shards/..."}], [{user, "foo"}, {shard, "shards/..."}])
Note the duplication of shard info, but I think this makes reading the logs and searching them simpler. Arguably you could have named bit in the string formatting that could read fields from ExtraFields
to avoid having to have the FormatArgs
argument. I don't know if Erlang's formatting supports it though.
logger
class too, though it's a bit long in the tooth.Thank you @mikerhodes.
So "action: compaction_start" might be a useful one that these examples miss out.
We definitely missed action
field. How would you add it to API? Are you proposing to add into ExtraFields?
Arguably you could have named bit in the string formatting that could read fields from ExtraFields to avoid having to have the FormatArgs argument. I don't know if Erlang's formatting supports it though.
Unfortunately Erlang io_lib:format
doesn't support named arguments or interpolation. However your comment lead me to one idea which I would like to hear some feedback about from a wider audience. The idea is to use maps for fields and pass list of fields to use in formatter in the order of appearance in the format string.
couch_log:error("Compaction started for ~p", [shard], #{
user => "foo",
shard => "shards/..."
}).
I think an action
field is a good idea with one caveat that without formal naming strategy it's fairly simple to get a lot of inconsistencies in there and have for example starting_compaction
in one place and compaction_start
in another. Can't think of a good way to solve it without defining some naming rules beforehand and then adding a check for compliance as a part of a review process.
Than again I might be overthinking this and it doesn't matter as long as the actions are unique and greppable in code.
On another note, here seems to be a general believe that FormatArgs
always represent data worth adding to a data portion of logs. I personally don't think that's the case, with a message log("Starting compaction for shard ~p on update sequence ~p", [<<"shard">>, 12])
it would be helpful to have an ability to query by shard
field, much less so for update_sequence
field. Let's not forget that data points are not free, each represent an index on a receiving side meaning storage size and log processing rate.
If we are insist on treating FormatArgs
and ExtraFields
as the same, then probably we are better to give up on io_lib:format
for message formatting and switch to a proper templating like mustache or dtl, if we can't keep things simple we as well can go whole way.
@iilyak said:
So "action: compaction_start" might be a useful one that these examples miss out.
We definitely missed action field. How would you add it to API? Are you proposing to add into ExtraFields?
Yes, I'd have it in extra fields. Not all log messages need it. Make it convention rather than enforced.
@eiri said:
I think an action field is a good idea with one caveat that without formal naming strategy it's fairly simple to get a lot of inconsistencies in there and have for example starting_compaction in one place and compaction_start in another.
I can totally see this. I'd prefer to start with seeing if it's a problem in practice. Hopefully we are not starting the same process in many places 😄
I personally don't think that's the case, with a message log("Starting compaction for shard ~p on update sequence ~p", [<<"shard">>, 12]) it would be helpful to have an ability to query by shard field, much less so for update_sequence field.
That's a really good point. Looking at other things I've used in the past, they do split up the idea of the format args for the message from the extra structured fields args.
I think @iilyak's proposal takes that into account:
couch_log:error("Compaction started for ~p", [shard], #{
user => "foo",
shard => "shards/..."
}).
I, at least, found this a very readable code stanza.
I’d like to change the way we generate and format log messages inside CouchDB. Currently we use “printf” style invocations of the form
couch_log:Level(Format, Args)
whereLevel
is one of the standard syslog levels. In production I find that this approach makes it rather difficult to quickly find log messages that match some criterion. We have multi-line messages, individual values in messages aren’t always labeled, etc. I’ll sometimes end up grep'ing for a bit of text that I know to be unique for a particular message type, but … really? We can do better.The idea behind structured logging is that the application sends along a set of key-value pairs as an entry to the logging framework. The logger can then invoke one or more handlers to generate different message formats. A dev handler might write out the key-value pairs in a readable key=value format, while in production you might turn on a handler that produces newline-delimited JSON messages which can be easily consumed by any decent logging solution.
I plan to take a pass at converting our existing logging invocations to a more structured format, then see about making the
couch_log
framework less printf-focused in its handler API interface. Comments and ideas welcome.