cloudfoundry-community / splunk-firehose-nozzle

Send CF component metrics, CF app logs, and CF app metrics to Splunk
Apache License 2.0
29 stars 29 forks source link

"[ENHANCEMENT] Add line breaking/sourcetyping feature" #315

Closed tstcw closed 2 years ago

tstcw commented 2 years ago

What would you like to be added On our search head we see events broken by [\r\n] at the Splunk Firehose Nozzle level, e.g. this part of a Java Stacktrace:

 {
   cf_app_id: "#######################",
   cf_app_name: "###############################",
   cf_org_id: "#######################",
   cf_org_name: "#######################",
   cf_space_id: "#######################",
   cf_space_name: "#######################",
   deployment: "#######################",
   event_type: "LogMessage",
   ip: "123.123.123.123",
   job: "############",
   job_index: "##############",
   message_type: "OUT",
   msg: "            at com.discovery.shared.transport.decorator.HttpClientDecorator.getApplications(HttpClientDecorator.java:134)",
   origin: "rep",
   source_instance: 0,
   source_type: "APP/PROC/WEB",
   timestamp: 1645798131873255700
}

Clearly the msg field is part of a multiline stacktrace, but the previous and the following parts have been cut off and encapsulated as part of another json message from the Nozzle.

As there seems to be no simple way to properly break events from the Nozzle with Splunk sourcetypes, because of the json encapsulation of different apps in one stream, the Nozzle should support line breaking on an app basis.

An expected Event in Splunk would look like this:

 {
   cf_app_id: "#######################",
   cf_app_name: "###############################",
   cf_org_id: "#######################",
   cf_org_name: "#######################",
   cf_space_id: "#######################",
   cf_space_name: "#######################",
   deployment: "#######################",
   event_type: "LogMessage",
   ip: "123.123.123.123",
   job: "############",
   job_index: "##############",
   message_type: "OUT",
   msg: "2022-03-11 13:48:26.787  WARN 100 --- [up-GhC9BA-56715] o.s.a.r.l.SimpleMessageListenerContainer : Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused (Connection refused)
        at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.register(EurekaHttpClientDecorator.java:56)
        at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77)
        at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:66)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
        at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77)
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:112)
        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:776)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
        at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.register(EurekaHttpClientDecorator.java:56)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)",
   origin: "rep",
   source_instance: 0,
   source_type: "APP/PROC/WEB",
   timestamp: 1645798131873255700
}

Another option would be to have the event as a raw event with meta fields as indexed fields: 2022-03-11 13:48:26.787 WARN 100 --- [up-GhC9BA-56715] o.s.a.r.l.SimpleMessageListenerContainer : Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection refused (Connection refused) at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.register(EurekaHttpClientDecorator.java:56) at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77) at org.springframework.http.client.AbstractClientHttpRequest.execute(AbstractClientHttpRequest.java:66) at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186) at com.netflix.discovery.shared.transport.decorator.SessionedEurekaHttpClient.execute(SessionedEurekaHttpClient.java:77) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at com.netflix.discovery.shared.transport.decorator.RetryableEurekaHttpClient.execute(RetryableEurekaHttpClient.java:112) at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:776) at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236) at com.netflix.discovery.shared.transport.decorator.EurekaHttpClientDecorator.register(EurekaHttpClientDecorator.java:56) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)

Why is this needed: This is needed to allow our CF users to properly send logs from CF to Splunk without having to rely on edge case workarounds like a two-way newline replacement, which are not always possible for the majority of applications, and to leverage Splunk onboard configurations for proper event handling.

If there's a simpler way to accomplish this, please let me know.

Thanks!

JuergenSu commented 2 years ago

wen don't rely on nozzle for multiline events like stacktraces as this is imho not a nozzle topic but at the level where the LogStream is fed into the logtransport system the splitting of multiline events occure.

We use https://github.com/splunk/splunk-library-javalogging to log directly from apps to splunk, this preserves multiline events and allows a per app index routing and a seperation of platform logs and app logs.

se also https://dev.splunk.com/enterprise/docs/devtools/java/logging-java/howtouseloggingjava/enableloghttpjava

slcardinal commented 2 years ago

This would be of great benefit for my organization as well. We are having to move from a custom built "splunk-nozzle" for application logging, which supports multiline events, to something "off the shelf". We found this project as a potential replacement. The one major deficiency we have discovered in our testing is support for multiline events. We have other applications that are not Java based that have multiline events, so using a specific additional app for handling multiline event support would become a management nightmare.

kashyap-splunk commented 2 years ago

Thank you @tstcw for sharing this.

Actually, the nozzle does not split the events by new-line or any other criteria. It just parses the individual events received from the firehose/CF, adds some metadata, and then sends it to Splunk. It does not split or merge any events. So the events you are seeing must have been split already at firehose/CF side.

So I suggest checking at CF side why the events are split.

kashyap-splunk commented 2 years ago

Closing this due to long inactivity. Please feel free to reopen or open a new one if any issues.