Closed awx-vsyr closed 2 years ago
Hey 👋
Thanks for reporting this enhancement request and the positive feedback!
It totally makes sense to retry requests to the HEC and I already implemented it using the following strategy:
There might still be issues with the HEC like an invalid token, so I'd highly recommend to create an alert for that if forwarded alerts are production critical. That's why I did not implement a retry staregy in the first place. Very simple example:
(index="_internal" OR index="cim_modactions") sourcetype="modular_alerts:forward_alert_to_splunk_hec" source="/opt/splunk/var/log/splunk/forward_alert_to_splunk_hec_modalert.log" NOT INFO NOT DEBUG NOT WARN
I fixed this issue in this commit. I use repository mirroring from a private GitLab repo to this public GitHub repo, so I'm sorry that there's no pull request for this. 😄
This will be available in the next release!
~ Julian
thanks Julian, much appreciated :) is it worth making a config setting as well in case people's environment is different? also with the backoff factor how exactly does it work - in context of cim mod alert - do splunk cap max execution time/is it addon manifest adjustable and will the default setting of 5 with default timeout of 30s and backoff increasing the interval between the tries exceed the 'time to live' before splunk decides to kill the custom alert python process on the SH ?
There might still be issues with the HEC like an invalid token, so I'd highly recommend to create an alert for that if forwarded alerts are production critical.
ty yep it's always worth keeping an eye on on cim mod action failures in case of unexpected outages or splunk infra issues aside from tokens going missing
by the way in term of definite monitoring suggestion
Unable to forward alert to HEC!
that would only be in the final failed try correct?
although >(index="_internal" OR index="cim_modactions") sourcetype="modular_alerts:forward_alert_to_splunk_hec" source="/opt/splunk/var/log/splunk/forward_alert_to_splunk_hec_modalert.log" NOT INFO NOT DEBUG NOT WARN
is more generic as it doesn't look for a specific error (or alternatively something like ERR OR WARN OR EXCEPT* but wildcards bad :) ) though it's a small log (well with the additional constraints of sourcetype and source).
qq 2: what is the process to get from the releases on github page to the splunk appbase version if that's alright to ask (for cloud customers)? Although I suppose could also upload as private app ?
@Stjubit hello Julian, could you please confirm about the splunk cloud app store release ?
hello,
(very nice addon :) ) quick question what is the default behaviour on 5xx errors?
from a glance at https://github.com/Stjubit/TA-alert_forwarder/blob/master/TA-alert_forwarder/bin/ta_alert_forwarder/modalert_forward_alert_to_splunk_hec_helper.py#L46 it looks to be 'don't retry' - Is that correct is retrying perhaps implied (seems unlikely given it's raw requests lib)
Would you be willing to add something like this using
backoff
? say retry 3 times, https://stackoverflow.com/questions/70602830/how-to-retry-python-requests-get-without-using-sessionsor perhaps using a retry adapter https://majornetwork.net/2022/04/handling-retries-in-python-requests/
:)