dCache / dcache

dCache - a system for storing and retrieving huge amounts of data, distributed among a large number of heterogenous server nodes, under a single virtual filesystem tree with a variety of standard access methods
https://dcache.org
289 stars 136 forks source link

Journalctl and Log-Messages stretching over multiple lines #5734

Open christianvoss opened 3 years ago

christianvoss commented 3 years ago

Dear dCache developers,

since the switch to systemd we employ journalbeat to collect the logging entries and send them via Kafka to our monitoring and analysing infrastructure.

We observe several log-entries, e.g. denied logins in gPlazma, that stretch over multiple lines in the old log files and now in journalctl. This makes analysing the entries inconvenient since about 10 events cover one failed login. Is it possible that all log-entries are contained in one single line? Maybe through an option to make logs more machine-readable?

Thanks a lot, Christian for the DESY dCache Team

paulmillar commented 3 years ago

Hi Christian,

I agree the current setup is sub-optimal.

The currently multi-line output from login failures is quite deliberate. It is very hard to describe why a login failed as many modules may be configured OPTIONAL or SUFFICIENT. Therefore, the overall failure to login can be a combination of many OPTIONAL/SUFFICIENT modules failing and gPlazma cannot know which of these should have succeeded (for this particular login attempt).

In addition, modules that succeed (e.g., an AUTH module that added a least one principle) might still be causing a problem by yielding the "wrong" principals.

Therefore, the only way to guarantee enough information to diagnose a login failure is to describe everything that gPlazma did (or didn't) do. Beyond a trivial (and unrealistic) gPlazma configuration, I don't believe it is possible to log everything that happened on a single line in natural English prose without that being impossible for a human to read.

That said, it would be technically possible to log a (single-line) JSON object that contains the same information, and develop admin tools to deserialise that object.

However, I'm not convinced this is the correct approach to this problem.

Although gPlazma failed login reports span multiple lines, they are (quite deliberately) issued as a single log entry. The logging system should (in principle) understand that this is not several independent log entries, but one log entry that spans multiple lines.

So, I would turn this around and ask whether you can configure journalbeat to handle multi-line log entries?

On a related note: journald can handle multi-line log entries, but it strips away leading and trailing white-space when showing the log entries with journalctl. This makes the output ugly for the gPlazma failed login printer.

I looked into this and found a ticket against the journald developers for stripping trailing/leading white-space. Fully preserving the log entry is only possible by configuring journald to forward logging to syslog; however, they agreed to preserve leading white-space in journalctl output. That said, my experience on CentOS suggests the version of journalctl installed there does not have this fix.

christianvoss commented 3 years ago

Hi Paul,

thanks a lot for the clarification. I'll try to investigate the features journalbeat might offer us. Maybe it can identify the multi-line nature. I assume the same argument is true for e.g. the FTP logging for aborted transfers?

Thanks a lot, Christian

paulmillar commented 3 years ago

Yes, the same problem exists for the FTP logging for aborted transfers.

There might be other places where dCache logs multiple lines, but I can't think of them just now.