cloudfoundry-community / splunk-firehose-nozzle

Send CF component metrics, CF app logs, and CF app metrics to Splunk
Apache License 2.0
29 stars 29 forks source link

Bugfix/disconnect due to slow consumer event drop #289

Closed kashyap-splunk closed 2 years ago

kashyap-splunk commented 2 years ago

Adding mechanism to drop events if the events queue is full to avoid the blocking of the main nozzle goroutine and hence avoid passing the back pressure back to Doppler when the downstream is slow.

There is still room to further improve it by moving the event parsing from main thread to writer threads. This will be addressed in later pull-requests targeted in subsequent releases.

luckyj5 commented 2 years ago

How far did we test it? Did we test it with reproducible scenario with slow-consumers errors on tc?

kashyap-splunk commented 2 years ago

How far did we test it? Did we test it with reproducible scenario with slow-consumers errors on tc?

Yes, tested with a low values of the queue size and HEC workers to reproduce the errors while reading and verified that the new build drops events in the same scenario instead of any errors from web-socket.