Closed GowthamShanmugam closed 7 years ago
@brainfunked @shtripat please review
@shtripat please review this
@shtripat @brainfunked @nnDarshan @anmolbabu please review this
@brainfunked @shtripat @anmolbabu @nnDarshan
@brainfunked @shtripat @nnDarshan please review this
@brainfunked @shtripat @nthomas-redhat @nnDarshan @anmolbabu please review this
Please add https://github.com/Tendrl/usmqe-tests/issues/14 as reference for testing alerts.
@mkudlej done
@anmolbabu @shtripat @brainfunked @nthomas-redhat please review
@Tendrl/tendrl-core @GowthamShanmugam
The alert life-cycle involves multiple components as below:
The alerting module: Handles validation of alert. Processes the alert to find if there was a previously detected similar alert from same source to detect if its a duplicate alert or a clearing alert for the previously occurred alert, etc... and probably a few more will get added as their necessity arises. ** Notifying the alert to configured destinations.
The node-agent: ** It takes the only responsibility of performing an initial primitive validation of alert and transporting any validated alert on its socket to etcd.
The bridge/collectd: ** Generate an alert for any status change or utilization threshold breach accordingly and put it on alert socket.
From above, it can be observed that the only place where an alert is in its fully processed/polished form is only in alerting module and everywhere else, its a raw alert that does not have tendrl-specific intelligence processed form of the alert. So my opinion is it would be better to invoke the MessageHandler intended to be developed in accordance with this spec, receive the alert from a place where the alert is actually fully processed(i.e, alerting module) rather than from the source of the alert(where the alert is fairly raw i.e, not processed and validated by tendrl intelligence).
Please let me know what @Tendrl/tendrl-core feels about this...
@brainfunked please give your suggestion for anmol 's comment
@brainfunked @shtripat @nthomas-redhat @r0h4n @anmolbabu please review
@brainfunked i have modified this spec as per your suggestion, please review
@shtripat @nthomas-redhat @r0h4n please review
@brainfunked @nthomas-redhat @shtripat @nnDarshan @anmolbabu @anivargi @r0h4n please review this spec , i have updated as per discussion.
@brainfunked @nthomas-redhat @shtripat @anmolbabu @nnDarshan @anivargi @r0h4n please review this
@TimothyAsir @brainfunked @anmolbabu @nthomas-redhat @shtripat At the time of logging a message, if the socket fails then log message never reach to logging framework. In this case we have to intimate the end user like socket is down and log messages are not logged successfully. Other wise the log message will lost. The problem here is without socket we can log the message anywhere, we need some mechanism to handle this case.
@GowthamShanmugam I am not sure how to tackle, but as a preferable option I would like to log these kind of failures in some common tendrl files say /var/log/tendrl/tendrl.log
. Just a wierd thought...
@TimothyAsir @brainfunked @anmolbabu @nthomas-redhat @shtripat At the time of logging a message, if the socket fails then log message never reach to logging framework. In this case we have to intimate the end user like socket is down and log messages are not logged successfully. Other wise the log message will lost. The problem here is without socket we can log the message anywhere, we need some mechanism to handle this case.
@GowthamShanmugam In case of logger failure : log errors to stderr and messages to stdout , this should ultimately go into systemd journals. we should be able to alert to etcd directly via the node agent in case there's a problem with the socket . Errors should be logged to stderr always regardless of logger failed or not
@nthomas-redhat i will change as per your suggestion
@brainfunked @r0h4n @shtripat Spec is updated
@GowthamShanmugam slight changes are going to be have to made as follows:
/queue/:job_id
will be a directory
/queue/:job_id/payload
will be the json dump of the job with the same structure we have today
/queue/:job_id/status
will be the global status which needs to be updated as per the currentstatus of the job, default will be 'new'
/queue/:job_id/errors
will be a message if the status is error
/queue/:job_id/messages
will be the log messages related to the job.
The request_id is not need anymore for the API to know about the job messages. CC: @r0h4n @brainfunked
@brainfunked @GowthamShanmugam I would also like to recommend if we could put the messages under something like /messages/jobs/:job_id
, would that be a option?
@GowthamShanmugam We need to move the messages per job under /messages/jobs
/messages/jobs/:job_id
will be a directory
/messages/jobs/:job_id/:sequence
will have value of the message
Other non job related messages will reside under
/messages/events
CC: @brainfunked
@brainfunked @anivargi @r0h4n as per above discussion i have modified job structure https://github.com/Tendrl/commons/pull/175/files
@anivargi I think we have missed the "parent_id" field in this job structure
@r0h4n @anivargi as per current job structure parent id also comes under payload
@anivargi @brainfunked Messages are stored in different places
which is correct?
Signed-off-by: GowthamShanmugam gshanmug@redhat.com
tendrl-bug-id: Tendrl#28