Open ghost opened 4 years ago
It works properly. It just doesn't do what you're expecting.
Your json input doesn't have nested json. If you parse it in the browser with JSON.parse
, you'll find the following:
To get your desired effect, your level_1_obj
value itself will have to be stringified first
"level_1_obj":"{\"level_2\":\"level_2_value\",\"level_2_obj\":{\"level_3\":\"level_3_value\"}}"
What max_depth
does is recursively trying to decode the underlying fields until the max_depth
is hit. So if you set it to 2
, it will still be able to decode "level_1_obj":"{\"level_2\":\"level_2_value\",\"level_2_obj\":{\"level_3\":\"level_3_value\"}}"
Anyway, the documentation is not clear enough for me. And I suppose not only for me but for many other users.
The max_depth
option behaves more like a limit option to prevent stack overflow but not for parsing JSON to N level depth and leave all next levels as an unparsed string.
I implemented the functional with logstash + ruby plugin. And did all necessary parsing logic with the ruby script. Now I have only the first 2 levels as document fields in Elasticsearch indexes. All next subfields stored as a string value of the fields.
I understood it exactly as @vitaliy-kravchenko. The max_depth
option behaves as a limit to prevent mapping explosion.
I have tons of respect for Filebeat and I use it in multiple projects as a collector but I just spent 3 days trying to debug this until I found this issue and I agree it looks like the documentation is not clear at all about this. While we're at it, I'm not sure expecting message
to be stringfied so this can work properly is reasonable. I've never seen logs that are like that. Right now I'm trying to fix this problem by doing some Elasticsearch's ingest pipeline trickery but it's depressing as Filebeat is so much better than ES pipelines despite of this issue... 😢
@sayden : I guess this issue is important to provide a reliable way to prevent mapping explosions.
I'm creating some configuration references to index our own beats logs (running on Kubernetes) in Elasticsearch. With the json
logging support (logging.json: true
) this is very straight forward and the logs can be decoded just by using the decode_json_fields
.
With max_depth: 1
the objective should be (apparently
) to have only the first level of fields decoded (level
, timestamp
, logger
, caller
, message
, monitoring
, etc), and if any of these are json objects they shouldn't be expanded to fields.
As you know, the monitoring
part of our log messages (in Filebeat or Metricbeat for example) is a big json object with a lot of sub-objects. Expanding them always makes very difficult to index our own logs in a nice way in Elasticsearch (we definitely don't want to create all monitoring sub-fields in a filebeat index, as it doesn't make any sense, but keeping the long monitoring
strings as references makes sense).
I don't know if this is considered a bug or not (@adriansr might have a different view), but just for your consideration!
same behavior:
filebeatConfig:
filebeat.yml: |
## Hints based autodiscover
filebeat.autodiscover:
providers:
- type: kubernetes
node: ${NODE_NAME}
hints.enabled: true
processors:
- if:
equals:
kubernetes.labels.app/logs_json: "true"
then:
- decode_json_fields:
fields: ["message"]
process_array: false
max_depth: 1
target: "foo"
overwrite_keys: true
add_error_key: true
Both process_array
and max_depth
have no effect on nesting and json parsing, i.e. the whole JSON object is always parsed :(
PS: INFO [beat] instance/beat.go:1023 Build info {"system_info": {"build": {"commit": "e127fc31fc6c00fdf8649808f9421d8f8c28b5db", "libbeat": "7.14.0", "time": "2021-07-29T20:56:59.000Z", "version": "7.14.0"}}}
Same here. This is currently a show stopper for us since we have complex data in some fields which are not to be decoded.
Setting max_depth has no effect. The whole sub-structure is decoded into fields.
I experience the same with 7.16.2 max_depth has no effect on the parsing of the json logs.
I experience the same with 7.16.2 max_depth has no effect on the parsing of the json logs.
It seems to be the same on 7.16.3
This is really important for us too as going more than 1, 2 levels will eventually break the ES index template mapping and logs will start being dropped and we cannot expect to always change how logs are escaped depending on the depth we want to achieve.
Even if filebeat is able to fully parse a document from the start, we were expecting this setting to properly adjust the mappings and also save as string anything beyond the value set.
We're also experiencing this problem. This is much needed functionality, it would seem.
As a workaround: Isn't it straightforward to use the "script" processor to implement the desired functionality? Either by first applying the decode_json_fields processor, then re-encoding fields into json from javascript; or by doing everything in javascript?
As a workaround: Isn't it straightforward to use the "script" processor to implement the desired functionality? Either by first applying the decode_json_fields processor, then re-encoding fields into json from javascript; or by doing everything in javascript?
Good idea, I do exactly like what you said, and it works well. Here is my code:
- script:
lang: javascript
source: >
function process(event) {
for(var p in event.Get("data")){
if (event.Get("data")[p] != null && typeof event.Get("data")[p] == 'object') {
event.Put("data."+p, JSON.stringify(event.Get("data")[p]))
}
}
}
Does anyone know if this problem persist in filebeat or elastic agent 8?
To prevent creating tons of document fields in an Elasticsearch log index I want to control nested JSON parsing depth.
Related discussion post: https://discuss.elastic.co/t/filebeat-decode-json-fields-processor-max-depth-option-not-working/240948
Filebeat version 7.8.0 (also tested on 6.8.10 and result is the same)
/tmp/filebeat.conf:
/tmp/filebeat.input:
Command:
Result:
Expected result: