Specification file for logging implementation

GowthamShanmugam commented 7 years ago

Signed-off-by: GowthamShanmugam gshanmug@redhat.com

tendrl-bug-id: Tendrl#28

GowthamShanmugam commented 7 years ago

@brainfunked @shtripat please review

GowthamShanmugam commented 7 years ago

@shtripat please review this

GowthamShanmugam commented 7 years ago

@shtripat @brainfunked @nnDarshan @anmolbabu please review this

GowthamShanmugam commented 7 years ago

@brainfunked @shtripat @anmolbabu @nnDarshan

GowthamShanmugam commented 7 years ago

@brainfunked @shtripat @nnDarshan please review this

GowthamShanmugam commented 7 years ago

@brainfunked @shtripat @nthomas-redhat @nnDarshan @anmolbabu please review this

mkudlej commented 7 years ago

Please add https://github.com/Tendrl/usmqe-tests/issues/14 as reference for testing alerts.

GowthamShanmugam commented 7 years ago

@mkudlej done

GowthamShanmugam commented 7 years ago

@anmolbabu @shtripat @brainfunked @nthomas-redhat please review

anmolbabu commented 7 years ago

@Tendrl/tendrl-core @GowthamShanmugam

The alert life-cycle involves multiple components as below:

The alerting module: Handles validation of alert. Processes the alert to find if there was a previously detected similar alert from same source to detect if its a duplicate alert or a clearing alert for the previously occurred alert, etc... and probably a few more will get added as their necessity arises. ** Notifying the alert to configured destinations.
The node-agent: ** It takes the only responsibility of performing an initial primitive validation of alert and transporting any validated alert on its socket to etcd.
The bridge/collectd: ** Generate an alert for any status change or utilization threshold breach accordingly and put it on alert socket.

From above, it can be observed that the only place where an alert is in its fully processed/polished form is only in alerting module and everywhere else, its a raw alert that does not have tendrl-specific intelligence processed form of the alert. So my opinion is it would be better to invoke the MessageHandler intended to be developed in accordance with this spec, receive the alert from a place where the alert is actually fully processed(i.e, alerting module) rather than from the source of the alert(where the alert is fairly raw i.e, not processed and validated by tendrl intelligence).

Please let me know what @Tendrl/tendrl-core feels about this...

GowthamShanmugam commented 7 years ago

@brainfunked please give your suggestion for anmol 's comment

GowthamShanmugam commented 7 years ago

@brainfunked @shtripat @nthomas-redhat @r0h4n @anmolbabu please review

GowthamShanmugam commented 7 years ago

@brainfunked i have modified this spec as per your suggestion, please review

GowthamShanmugam commented 7 years ago

@shtripat @nthomas-redhat @r0h4n please review

GowthamShanmugam commented 7 years ago

@brainfunked @nthomas-redhat @shtripat @nnDarshan @anmolbabu @anivargi @r0h4n please review this spec , i have updated as per discussion.

GowthamShanmugam commented 7 years ago

@brainfunked @nthomas-redhat @shtripat @anmolbabu @nnDarshan @anivargi @r0h4n please review this

GowthamShanmugam commented 7 years ago

@TimothyAsir @brainfunked @anmolbabu @nthomas-redhat @shtripat At the time of logging a message, if the socket fails then log message never reach to logging framework. In this case we have to intimate the end user like socket is down and log messages are not logged successfully. Other wise the log message will lost. The problem here is without socket we can log the message anywhere, we need some mechanism to handle this case.

shtripat commented 7 years ago

@GowthamShanmugam I am not sure how to tackle, but as a preferable option I would like to log these kind of failures in some common tendrl files say /var/log/tendrl/tendrl.log. Just a wierd thought...

nthomas-redhat commented 7 years ago

@TimothyAsir @brainfunked @anmolbabu @nthomas-redhat @shtripat At the time of logging a message, if the socket fails then log message never reach to logging framework. In this case we have to intimate the end user like socket is down and log messages are not logged successfully. Other wise the log message will lost. The problem here is without socket we can log the message anywhere, we need some mechanism to handle this case.

@GowthamShanmugam In case of logger failure : log errors to stderr and messages to stdout , this should ultimately go into systemd journals. we should be able to alert to etcd directly via the node agent in case there's a problem with the socket . Errors should be logged to stderr always regardless of logger failed or not

GowthamShanmugam commented 7 years ago

@nthomas-redhat i will change as per your suggestion

GowthamShanmugam commented 7 years ago

@brainfunked @r0h4n @shtripat Spec is updated

anivargi commented 7 years ago

@GowthamShanmugam slight changes are going to be have to made as follows:

/queue/:job_id will be a directory /queue/:job_id/payload will be the json dump of the job with the same structure we have today /queue/:job_id/status will be the global status which needs to be updated as per the currentstatus of the job, default will be 'new' /queue/:job_id/errors will be a message if the status is error /queue/:job_id/messages will be the log messages related to the job.

The request_id is not need anymore for the API to know about the job messages. CC: @r0h4n @brainfunked

anivargi commented 7 years ago

@brainfunked @GowthamShanmugam I would also like to recommend if we could put the messages under something like /messages/jobs/:job_id, would that be a option?

anivargi commented 7 years ago

@GowthamShanmugam We need to move the messages per job under /messages/jobs

/messages/jobs/:job_id will be a directory /messages/jobs/:job_id/:sequence will have value of the message

Other non job related messages will reside under /messages/events

CC: @brainfunked

GowthamShanmugam commented 7 years ago

@brainfunked @anivargi @r0h4n as per above discussion i have modified job structure https://github.com/Tendrl/commons/pull/175/files

r0h4n commented 7 years ago

@anivargi I think we have missed the "parent_id" field in this job structure

GowthamShanmugam commented 7 years ago

@r0h4n @anivargi as per current job structure parent id also comes under payload

GowthamShanmugam commented 7 years ago

@anivargi @brainfunked Messages are stored in different places

If cluster_id present /clusters/cluster_id/Messages else /nodes/node_id/Messages
If job_id present stored in /Messages/jobs also /Messages/events else: only /Messages/events is this ok (or) job_updates should be stored in /Messages/jobs only not in /Messages/events, /clusters/cluster_id/Messages or /nodes/node_id/Messages.

which is correct?

Tendrl / specifications

Specification file for logging implementation #94