cloudfoundry-community / splunk-firehose-nozzle

Send CF component metrics, CF app logs, and CF app metrics to Splunk
Apache License 2.0
29 stars 29 forks source link

Prevent splunk nozzle from disconnecting due to being a slow consumer #280

Closed Benjamintf1 closed 3 years ago

Benjamintf1 commented 3 years ago

add threads for posting to splunk(10 per instance), add buffered channel, and drop if the buffer is full. Report on drops every 1000 drops.

Also, use go mod.

Benjamintf1 commented 3 years ago

Looks like we might need some ci changes as well if we want to merge this. LMK if you want me to do more.

kashyap-splunk commented 3 years ago

Hi @Benjamintf1, thank you for working on this. So the HEC-writers are already threaded (configurable number of goroutines) and the channel before them also has configurable size of buffer. Let me know if you think still threading/buffer needs to be added.

To avoid disconnect due to slow consumer, we can add event dropping at sink level (line). We will add a configuration for this. So the sink will drop events if configured, in case the buffer/workers are not enough to catch-up.

Also, please raise pull-requests for develop branch instead of master.

Benjamintf1 commented 3 years ago

Yeah, adding dropping is the key thing. It'd be fine if that happens on the sink level as long as no possibly blocking, or computational expensive steps happen between the receive and the putting in the channel there.

Benjamintf1 commented 3 years ago

looks like someone else already made a better pr wrt go mod!!!(Exciting!) https://github.com/cloudfoundry-community/splunk-firehose-nozzle/pull/283

Benjamintf1 commented 3 years ago

(I'd prefer it to be default, or at the very least, for it to log when it hits the buffer limit)

I'm trying to work on a new pr right now.

Benjamintf1 commented 3 years ago

https://github.com/cloudfoundry-community/splunk-firehose-nozzle/pull/284 newer version of the commit.