childe / gohangout

使用 golang 模仿的 Logstash。用于消费 Kafka 数据,处理后写入 ES、Clickhouse 等。
MIT License
1.01k stars 234 forks source link

JSON解析能否跳过不规则字符? #243

Closed yasincyx closed 3 months ago

yasincyx commented 3 months ago

大佬,以下是我的配置

inputs:
- Stdin: {}
filters:
    - Grok:
        src: message
        pattern_paths:
          - '/opt/gohangout/grokpattern'
        match:
          - '%{DATA:TIMESTAMP} %{DATA} %{DATA:source_host} %{DATA:PATH} %{GREEDYDATA:LOGHUB_USERLOG}'
        failTag: message_grokfail
        remove_fields: ['message']
    - Grok:
        src: LOGHUB_USERLOG
        pattern_paths:
          - '/opt/gohangout/grokpattern'
        match:
          - '\s*%{DATA} %{DATA} %{DATA} %{DATA:PATH} %{GREEDYDATA:json_data}'
          - '\s*%{GREEDYDATA:json_data}'
        failTag: LOGHUB_USERLOG_grokfail
        remove_fields: ['LOGHUB_USERLOG']
    - Json:
        field: json_data
        remove_fields: ['json_data']
    - Json:
        field: log
        remove_fields: ['json_data']
    - Date:
        location: 'Asia/Shanghai'
        src: 'start_time'
        target: '@timestamp'
        formats:
          - '2006-01-02T15:04:05.999+0000'
        failTag: start_time_parsefail
    - Date:
        location: 'Asia/Shanghai'
        src: 'time'
        target: '@timestamp'
        formats:
          - '2006-01-02T15:04:05.99999999Z'
        failTag: dateparsefail
outputs:
- Stdout: {}

源数据为 1715420594643 75 h72-pp /var/log/containers/uw5ccfe9598bd2906f42975e4.log {"log":" \"TraceContextHeaderName\": \"trace-id\"\n","stream":"stdout","time":"2024-05-11T09:43:08.892780189Z"}

解析后为 {"@timestamp":"2024-05-13T15:20:50.4024003+08:00","PATH":"/var/log/containers/uw5ccfe9598bd2906f42975e4.log","TIMESTAMP":"1715420594643","json_data":" \\"TraceContextHeaderName\\": \\"trace-id\\"\n\",\"stream\":\"stdout\",\"time\":\"2024-05-11T09:43:08.892780189Z\"}","source_host":"h72-pp","tags":["start_time_parsefail","dateparsefail"]}

预期是想跳过那个log的不规则部分,后续的照常解析,但是用上述json 插件好像无法实现

而用logstash是可以的 { "source_host" => "h72-pp", "stream" => "stdout", "TIMESTAMP" => "1715420594643", "event" => { "original" => "1715420594643 75 h72-pp /var/log/containers/uw5ccfe9598bd2906f42975e4.log {\"log\":\" \\"TraceContextHeaderName\\": \\"ntes-trace-id\\"\n\",\"stream\":\"stdout\",\"time\":\"2024-05-11T09:43:08.892780189Z\"}" }, "@timestamp" => 2024-05-11T09:43:08.892Z, "time" => "2024-05-11T09:43:08.892780189Z", "@version" => "1", "host" => { "name" => "GIH-D-26809" }, "log" => " \"TraceContextHeaderName\": \"trace-id\"\n", "PATH" => "/var/log/containers/uw5ccfe9598bd2906f42975e4.log" } logstash 主要的部分解析是 json { source => "json_data" skip_on_invalid_json => true remove_field => [ "json_data" ] } json { source => "log" skip_on_invalid_json => true remove_field => [ "log" ] }

可以像logstash那种skip invalid吗

childe commented 3 months ago

你给Gohangout 数据源和 Logstash 其实不一样,给Logstash的是一个合法的JSON,给Gohangout的那个不是。 注意双引号前面的转义字符

yasincyx commented 3 months ago

我回头看了下日志,发现original和我发在issue的不一样了,转义的部分也不同,可能这个编辑栏的问题?不过大佬你说的对,我把合法的json数据源给gohangout可以的。感谢大佬解惑