IBM / CAST

CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27 stars 34 forks source link

csm syslog message issues #471

Open morrone opened 5 years ago

morrone commented 5 years ago

The csm daemons seem to be logging to syslog in a weird way that causes duplicate dates in the syslog-ng log. Here is an example:

Dec 19 15:11:12 localhost Dec 19 15:11:12 sierra4347 CAST[-]: csmapi [1276710984]; csm_node_resources_query_all start

Next, I would like to see the string "CAST" in "CAST[-]" replaced with the actual name of the daemon (csmd, csmrestd, etc.).

Next, please remove the extraneous spaces between "[-]:" and "csmapi".

Also, I'm not so sure about the choice of using a minus sign in place of the actual PID between the square brackets. Is there a good reason that we can't put the real PID in there?

mew2057 commented 5 years ago

The CAST[-] was hard coded. I've updated the syslog to place the process name in the %APP-NAME% and the process id in the %PROCID% of the syslog message.

Using the default syslog formatting: $template logFormat, "%TIMESTAMP:::date-rfc3339% %HOSTNAME% %APP-NAME% %PROCID% %syslogseverity-text% %msg%\n

The updated version will look like: 2019-01-03T15:46:59-05:00 c650f03p39-mgt.pok.stglabs.ibm.com csmd 30357 debug csmapi; [0]; EventContextHandlerState Destructor

I tried to remove the extra whitespace, but nothing seemed to work. The subcomponent (csmdb, csmapi, etc. ) is now included the CAST message with a semicolon separating.

morrone commented 5 years ago

I think there are still some issues with this format:

2019-01-03T15:46:59-05:00 c650f03p39-mgt.pok.stglabs.ibm.com csmd 30357 debug csmapi; [0]; EventContextHandlerState Destructor

First, the program name and pid should not be space separated in standard syslog format. So we would like that part to instead look like this:

2019-01-03T15:46:59-05:00 c650f03p39-mgt.pok.stglabs.ibm.com csmd[30357]: debug csmapi; [0]; EventContextHandlerState Destructor

The next question is then about the duplicate timestamp and hostname. I rather suspect that those are already being added by either the syslog plugin to your logging library or by the syslogd that receives the message. I suspect that you need to remove the "%TIMESTAMP:::date-rfc3339% %HOSTNAME% " portion of your logFormat template to fix the duplication problem.

mew2057 commented 5 years ago

The space separation is ultimately just a print formatter since the process id and process name are stored in the TAG header (rfc3164). In the internal generation we format using the process_name[pid]: format, but I modified the pattern in rsyslog to make the string space delimited to reduce message size and simplify the pattern.

In rsyslog those portions are required for the timestamp and hostname to be visible in the log with the currently used format.

morrone commented 5 years ago

I'm not fully following what argument you are making. Are you saying that the space delimited process_name/pid is what appears in the syslog output or not? Because all that I care about in this ticket is what the syslog output looks like. If "logFormat" isn't the right place to fix the formatting issues in the csm syslog output, I have no issue with fixing it elsewhere. I'm just making suggestions based on what you have posted in this ticket thus far.

Also, keep in mind that we do not use rsyslog.

morrone commented 5 years ago

Any progress on handling the duplicate timestamp/hostname that csmd is creating in the syslog?