fstab / grok_exporter

Export Prometheus metrics from arbitrary unstructured log data.
Apache License 2.0
883 stars 154 forks source link

Unable to capture multiline pattern #46

Open bjusufbe opened 5 years ago

bjusufbe commented 5 years ago

Hello,

I'm trying to capture multiline pattern from the following text:

2018-10-08 06:55:35.156330 0x00007f3e3569c700: <info> (health::main.cpp@169) peer-node-0
         ACNTST C : 20'032 |  ACNTST C HVA : 22     |  ACTIVE PINGS : 0      |     B WRITERS : 0      |
     BLK ELEM ACT : 0      |  BLK ELEM TOT : 2'217  |      BLKDIF C : 665    |        HASH C : 13'013 |
       HASHLOCK C : 0      |   MEM CUR RSS : 65     |  MEM CUR VIRT : 774    |   MEM MAX RSS : 65     |
      MEM SHR RSS : 27     |      MOSAIC C : 1      |   MOSAIC C DS : 1      |          NS C : 1      |
          NS C AS : 1      |       NS C DS : 1      | RB COMMIT ALL : 0      | RB COMMIT RCT : 0      |
    RB IGNORE ALL : 0      | RB IGNORE RCT : 0      |       READERS : 3      |  SECRETLOCK C : 0      |
    SUCCESS PINGS : 0      |         TASKS : 11     |   TOTAL PINGS : 0      |   TS NODE AGE : 13     |
    TS OFFSET ABS : 0      | TS OFFSET DIR : 0      |  TS TOTAL REQ : 0      |   TX ELEM ACT : 0      |
      TX ELEM TOT : 3'081  |  UNLKED ACCTS : 1      |      UT CACHE : 0      |       WRITERS : 0      |

    2018-10-08 06:55:35.156661 0x00007f3e3569c700: <info> (health::main.cpp@169) peer-node-1
         ACNTST C : 20'032 |  ACNTST C HVA : 22     |  ACTIVE PINGS : 0      |     B WRITERS : 0      |
     BLK ELEM ACT : 0      |  BLK ELEM TOT : 2'236  |      BLKDIF C : 665    |        HASH C : 13'013 |
       HASHLOCK C : 0      |   MEM CUR RSS : 65     |  MEM CUR VIRT : 770    |   MEM MAX RSS : 66     |
      MEM SHR RSS : 27     |      MOSAIC C : 1      |   MOSAIC C DS : 1      |          NS C : 1      |
          NS C AS : 1      |       NS C DS : 1      | RB COMMIT ALL : 0      | RB COMMIT RCT : 0      |
    RB IGNORE ALL : 0      | RB IGNORE RCT : 0      |       READERS : 2      |  SECRETLOCK C : 0      |
    SUCCESS PINGS : 0      |         TASKS : 10     |   TOTAL PINGS : 0      |   TS NODE AGE : 13     |
    TS OFFSET ABS : 0      | TS OFFSET DIR : 0      |  TS TOTAL REQ : 0      |   TX ELEM ACT : 0      |
      TX ELEM TOT : 2'063  |      UT CACHE : 0      |       WRITERS : 1      |

I want: Capture MEM CUR VIRT that is correlated to peer-node-0 (that would be the first occurrence of MEM CUR VIRT)

I wrote following pattern in config.yml:

global:
        config_version: 2
    input:
        type: file
        path: ./example/my_file.log
        readall: true # Read from the beginning of the file? False means we start at the end of the file and read only new lines.
    grok:
        patterns_dir: ./patterns
    metrics:
        - type: gauge
          name: peer_node_0_MEM_CUR_VIRT
          help: peer_node_0_MEM_CUR_VIRT
          match: '(?m)%{GREEDYDATA}peer-node-0%{GREEDYDATA}MEM%{SPACE}CUR%{SPACE}VIRT%{SPACE}:%{SPACE}%{INT:data}%{GREEDYDATA}peer-node-1%{GREEDYDATA}'
          value: '{{.data}}'
    server:
        port: 9144

What I get: However, this doesn't work even if it works on: http://grokdebug.herokuapp.com/

If I write following match:

%{GREEDYDATA}MEM%{SPACE}CUR%{SPACE}VIRT%{SPACE}:%{SPACE}%{INT:data}%{GREEDYDATA}

it will work in Grok Exporter but I will not have correlation to specific peer-node (last occurrence will be taken)

Is there any other way to capture multi-line pattern in grok exporter?

fstab commented 5 years ago

grok_exporter does not support multi-line patterns. The reason is that grok_exporter processes the log file line by line. Whenever a new line is written to a log file, the patterns are applied to the new line only, and the metrics are updated accordingly. I don't think there is a straightforward way to support multi-line patterns, because that would mean that if a new line is written to a log file, the entire file had to be re-processed.

bjusufbe commented 5 years ago

Thanks for the answer. Makes sense what you are saying about the necessity for re-processing of complete file. To be able to solve this problem, I could make a mechanism that would split this output into 3 separate files but is there a way to define multiple config.yml files (or inside same config.yml file, multiple global sections) so I could logs from these 3 separate files in parallel? Thanks.

fstab commented 5 years ago

Multiple file support is the most requested feature for grok_exporter, and I will implement it soon. As of now, you need to start a separate grok_exporter instance for each file.

bjusufbe commented 5 years ago

Thanks for the info. Looking forward seeing that feature.

iMartyn commented 5 years ago

+1 I'd love to see multiline support and if I can tack on a request, the ability to take the time between two patterns would make this even better.
For instance if a log like this presents :

[{timestamp}] Downloading blah.... 
[{timestamp}] Processing file blah...

Being able to say that there was n seconds between would make a world of difference.

fstab commented 5 years ago

@iMartyn implementing generic multi line support is hard, because as far as I can see it means processing the entire log file whenever a new line is added.

However, the time between two loglines thing you mention could be implemented with single line patterns. Maybe we could implement allow something like match_start and match_end as an alternative to the match pattern in the metrics configuration. This could be used for matching events with a specific start line and a specific end line.

The main problem would be to learn which start line belongs to which end line. Do you have a specific example, and is there something to correlate the lines (like a unique correlation ID)?

If you have a real-world example, please open a new issue for that, and I'll have a look.

IldarMinaev commented 4 years ago

Hi @fstab , any updates regarding multiline support? It will be nice to have something like filebeat/logstash have:

Skeen commented 4 years ago

One option for multiline support is to do the multiline parsing using another program, and sending the multiline blocks using the webtailer.

In our setup, we parse our multiline logs fluentbit and then send the parsed logs as json to grok_exporter.

jimbob-s commented 2 years ago

One way this could be implemented in a single pass, is to create what I call a "context" match.

It could pre-capture patterns into variables, for use when the "match" pattern matches. So, for instance, context in his case would be "peer-node-X", and the matched pattern could be brought into the metrics as timestamps, labels, or values.

I ran into this when a misguided logger would put the timestamp in a separate line above the pattern I wanted to match.

JohnTheBeloved commented 2 years ago

Another way is to write a script to read the multiline log lines from the inital log, combine to a buffer string and write the string into another file when another multiline log is reached

the new file can then be used with grok_exporter