ManageIQ / manageiq

ManageIQ Open-Source Management Platform
https://manageiq.org
Apache License 2.0
1.34k stars 900 forks source link

Add ability to customize logging in the evm.log #20048

Open dmetzger57 opened 4 years ago

dmetzger57 commented 4 years ago

Originating BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1605935

Description of request:

Looking for the ability to customize logging in the evm.log to be able to group like tasks together.

For example, the dashes that precede each evm.log line contains this:

[----]

Looking to have the user be able to customize what these dashes could represent, i.e:

[1000]

or

[AAAA]

And these IDs could be linked to similar tasks.

A common use case, would be to group all tasks that are associated to a parent task (child tasks and grandchild tasks) with a common ID, so that grepping through logs of many appliances would be much faster.

Another example would be to use a common ID for any task associated with a provider, or more granular, any refresh task associated with a provider could be given a unique ID.

Fryguy commented 4 years ago

The purpose of the [----] was originally supposed to support a message catalog. That is, when an error occurs it's given a pre-defined number, such as [1234]. Then separately a message catalog would be avaliable with the lookup values of all of those numbers. In the end it was never used, but we didn't remove it from the logs themselves because by that time a number of log scrapers had already been written expecting it to be there.

So, we can repurpose, but it would have to stay 4 characters long (and if we used something like the Base64 encoding set, this is 17_043_520 unique values).

ability to customize logging in the evm.log

Making this customizable (i.e. customer changeable) seems overly complicated. It seems more of a development task to write the logs with the proper grouping.

That being said, we'd have to talk out what these groupings would be, and considering there are (super-roughly) 19000 call sites (via grep of .info|warn|error), it seems kind of a monumental task to update all of them to be aware of this grouping.

A common use case, would be to group all tasks that are associated to a parent task (child tasks and grandchild tasks) with a common ID, so that grepping through logs of many appliances would be much faster.

This should already be done with the Q-task-id part of the log message (also called "tracking label") [ref]. Separately in containers, we've broken this out as "request_id" [ref]

Fryguy commented 4 years ago

Instead of trying to manipulate the logs to work for the user, I think we should flip it around come up with a solution that actually solves the user's problem. I'm hearing that users go to the logs and can't find what they need, but it sounds like the user having to go to the logs in the first place is the problem. Instead we need a solution that brings the correct parts of the logs to them directly in the UI. @dmetzger57 Do you have more information about the context in which the user needs to go to the logs?

I know for automate, @mkanoor and myself have discussed having automate task specific logging that is RBAC'd properly so they can see the information pertaining to their task right in the UI. It depended on the systemd-journal-gateway which would centralizing the logs and then making them queryable. See https://github.com/ManageIQ/manageiq/issues/19582