elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.17k stars 4.92k forks source link

Ability to have a custom name for json data #3025

Closed f-ld closed 4 years ago

f-ld commented 7 years ago

Right now, the name of the key where json data is put is hardcoded, see: https://github.com/elastic/beats/blob/master/filebeat/input/event.go#L108

This ticket is to have it under a key read from configuration.

PR to come

f-ld commented 7 years ago

PR: https://github.com/elastic/beats/pull/3026

f-ld commented 7 years ago

The change requested here could "almost" be implemented with the changes made in version 5.2.0 about the decode_json_fields processor, (see https://www.elastic.co/guide/en/beats/filebeat/current/decode-json-fields.html).

Indeed, something similar to this the json configuration from the prospector:

filebeat.prospectors:
- input_type: log
  json:
    # This is the requested addition to filebeat to have the decoded json under key "foobar" and not under key "json"
    target: "foobar"
    keys_under_root: false
    overwrite_keys: false
  # other prospector conf goes here ...

could be achieved by removing it and adding the following two processors:

processors:
- decode_json_fields:
    fields: ["message"]
    target: "foobar"
    max_depth: 10
- drop_fields:
    fields: ["message"]

The problem is that during my tests, the processor was parsing fields like {"someDate": "2016-09-28T01:40:26.760+0000"} as if it was an integer only, so we endup having "someDate": "2016" in the decoded json.

f-ld commented 7 years ago

Started discussion here about (what I assumed is an) issue preventing other options to do what was expected here: https://discuss.elastic.co/t/bug-in-decode-json-fields-data-loss-tye-conflict/73812

fldvonage commented 6 years ago

We have a global pipeline base on json records written one per line in files by many applications and where we are using filebeat to push to kafka and then from kafka we are using:

And having "json" added by filebeat for the original data parsed as json is creating confusion for example in the database (where users do not understand why keys are called for example "json.alert" when the "alert" columns is actually not containing json). And we do not want to remove the filebeat metadata because we rely on it for troubleshooting nor mix it with our content (no no keys_under_root possible)

So this is why, since the beginning we decided to have in filebeat configuration a patch to be able to change the name of the key under which it would put the json data. We can then use "data" leading at the end to "data.alert" as column name in the database.

This is a small change in filebeat and I still think this could be a great benefit in many cases (if we do not only think about Elastic Search).

But now, filebeat is changing more and more and porting those changes from one version to another is more and more expensive. So can you please (for example @ruflin, based on discussion on a previous pull request #3026) tell me if such change could get a chance to be accepted and if yes on which version (if not master) I should make a Pull Request.

Otherwise I will keep from time to time to report the minimum changes for the json input only (like I just did for version 6.2.4)

ruflin commented 6 years ago

@fldvonage I wonder if the rename processor here could solve your issue: https://github.com/elastic/beats/pull/6292

For the change can you open a PR against master if you have the code already which would make discussions easier.

fldvonage commented 6 years ago

I do not have the code ready because I noticed quite some changes between 6.2.4 and master and think that I need to check them to make sure no new feature would also require changes.

And that rename processor could be an option, I'll give it a try

ruflin commented 6 years ago

There have been quite a few changes but it's mostly renaming. Let me know how the renaming helps.

faec commented 4 years ago

This seems to be addressed by the rename processor, and hasn't had any activity since that was added, so I'm closing this. Feel free to provide more details or open a new issue if this is still an issue in current releases.

gnumoksha commented 4 years ago

An example for future reference:

filebeat.inputs:
- type: log
  # Reference https://www.elastic.co/guide/en/beats/filebeat/7.x/filebeat-input-log.html
  enabled: true
  paths:
    - /var/app/current/storage/logs/*.log*

  tags: ["json", "monolog"]

  # If this option is set to true, the custom fields are stored as top-level fields in the output document instead of being grouped under a fields sub-dictionary.
  fields_under_root: true

  # Optional fields that you can specify to add additional information to the output.
  fields:
    event.dataset: app
    service.type: app

  json:
    # An optional configuration setting that specifies a JSON key on which to apply the line filtering and multiline settings. If specified the key must be at the top level in the JSON object and the value associated with the key must be a string, otherwise no filtering or multiline aggregation will occur.
    #message_key: message

    # By default, the decoded JSON is placed under a "json" key in the output document. If you enable this setting, the keys are copied top level in the output document. The default is false.
    keys_under_root: false

    # If keys_under_root and this setting are enabled, then the values from the decoded JSON object overwrite the fields that Filebeat normally adds (type, source, offset, etc.) in case of conflicts.
    overwrite_keys: false

    # If this setting is enabled, Filebeat adds a "error.message" and "error.type: json" key in case of JSON unmarshalling errors or when a message_key is defined in the configuration but cannot be used.
    add_error_key: true

    # An optional configuration setting that specifies if JSON decoding errors should be logged or not. If set to true, errors will not be logged. The default is false.
    ignore_decoding_error: false

  processors:
    - rename:
        fields:
          - from: "json"
            to: "monolog"
        ignore_missing: false
        fail_on_error: true