driskell / log-courier

The Log Courier Suite is a set of lightweight tools created to ship and process log files speedily and securely, with low resource usage, to Elasticsearch or Logstash instances.
Other
419 stars 107 forks source link

Codec to set a field from message content? #319

Closed PMDubuc closed 8 years ago

PMDubuc commented 8 years ago

The version of Docker we're currently using doesn't support tagging of the log output. The logs got to the stdout of the container and are put in nondescript files. :-( If the developers prefix a tag to the logged message to identify its type. Is there a way for log-courier to extract that tag from the message an pass it on as a "type" field value so Logstash can "grok" it?

driskell commented 8 years ago

This in itself would be a grok. I believe best approach would be to grok at the Logstash side to extract the type then grok it again but then based on that type. Does that make sense?

I've been avoiding duplicating too much Logstash functionality so haven't had any plans to add a grok to the shipper.

PMDubuc commented 8 years ago

Thanks. That does make sense. Though I had thought of it as a modified form of the filter codec. For example:

name: "filter"
patterns: "^mylogtype:.*"
fields:
  - type: "mylogtype"

Of course, if you were trying to detect multiple log types, you would need to apply this 'filter' multiple times in sequence for each type so maybe it gets more complicated than it's worth.

driskell commented 8 years ago

Yeh it can get complex and I would want to support Logstash like conditionals to get it to work right and be simple to use. Log Courier codecs intends to only be a simple way to cut out unwanted traffic so your Logstash doesn't need to scale to unwanted proportions. The Multiline intends to simplify shipping too and ensure integrity in case of Logstash failures. It's also easier to configure Multiline at the shipper side per stream - Logstash receives many streams so in older versions especially it was difficult to configure properly and sometimes meant you couldn't run multiple workers (Multiline filter was single threaded and the codec needed extra config to work and would even then apply to ALL streams whether wanted or not.)

As someone with limited resource though allowing conditionals is pretty tempting and I've somewhat always wanted to implement a grok on a intermediary similar to Logstash since RE2 Regex is so much faster than PCRE for log parsing. I even played with implementing a Logstash RE2 Regex for grok but unfortunately hard to run in Java!

PMDubuc commented 8 years ago

I see your points and agree. Thanks.