elastic / logstash

Logstash - transport and process your logs, events, or other data
https://www.elastic.co/products/logstash
Other
14.21k stars 3.5k forks source link

JSON logging can't be ingested into Elasticsearch #7610

Open jimmyjones2 opened 7 years ago

jimmyjones2 commented 7 years ago

Using Logstash 5.4.3:

echo '{"foo":1}' | logstash-5.4.3/bin/logstash -f abc.conf --log.format json echo '{"foo":"bar"}' | logstash-5.4.3/bin/logstash -f abc.conf --log.format json

with the following config:

input {
  stdin {
    codec => json
  }
}
output {
  elasticsearch {
  }
}

Produces the following JSON log

{"level":"WARN","loggerName":"logstash.outputs.elasticsearch","timeMillis":1499431239669,"thread":"[main]>worker2","logEvent":{"message":"Could not index event to Elasticsearch.","status":400,"action":["index",{"_id":null,"_index":"logstash-2017.07.07","_type":"logs","_routing":null},{"metaClass":{"metaClass":{"metaClass":{"action":"[\"index\", {:_id=>nil, :_index=>\"logstash-2017.07.07\", :_type=>\"logs\", :_routing=>nil}, 2017-07-07T12:40:39.591Z localhost.localdomain %{message}]","response":{"index":{"_index":"logstash-2017.07.07","_type":"logs","_id":"AV0dEP_riOfCotFy8tMk","status":400,"error":{"type":"mapper_parsing_exception","reason":"failed to parse [foo]","caused_by":{"type":"number_format_exception","reason":"For input string: \"bar\""}}}}}}}}]}}

Which can't be ingested into Elasticsearch because there are mixed types in an array which is unsupported by Elasticsearch:

"action":["index",{"_id":null,"

Given the best place for all logs is Elasticsearch, this makes me sad!

andrewvc commented 7 years ago

@jimmyjones2 this is an excellent point! We should provide some good mappings for putting LS logs into ES. I think once the modules feature gets merged we can do this.

It's not, I wound say, a bug that logs can't automatically be put into elasticsearch with the default settings, though it certainly is inconvenient. We tend not to recommend that arbitrary JSON be put into ES with the default mappings for this very reason, type clashes.

The best practice here is to create an ES mapping with dynamic: false (documented here and explicitly declare the fields of a known type ('level', 'loggerName', 'timeMillis', 'thread', 'logEvent.message'). Additionally, it makes sense to index the raw JSON into an _all type text field for searching the full message.

The reason is that logs can be very arbitrary, and we don't want to bloat our mappings with arbitrary keys in ES.

Does this make sense?

jimmyjones2 commented 7 years ago

@andrewvc Makes sense, thanks!

anton-johansson commented 5 years ago

Any development on this? I just realized this when trying to push service logs (logs from the ELK-stack itself) into Elasticsearch. It's just an unending loop of error logs from Logstash.

A Logstash warning appears and should be sent to Elasticsearch. It cannot, due to the problem this issue describes, and therefore a new message is logged. Same thing happens to that message. And so on. :(

I'll have to revert to plain logs for now.

dramshaw-zymergen commented 5 years ago

+1

slmingol commented 4 years ago

Can we get a status update on this? It was reported 3+ yrs ago and haven't seen any movement on it since.

jeffspahr commented 4 years ago

I reread this issue today and realized it wasn't clearly settled that this is a Logstash bug.

It's not, I wound say, a bug that logs can't automatically be put into elasticsearch with the default settings, though it certainly is inconvenient.

Elastic recommends that you do ingest json directly and supports this across the rest of its stack by enabling json logs in Elasticsearch, Beats, and Kibana. Logstash is the only component of the Elastic Stack that writes json logs that can't be natively ingested into Elasticsearch. Here's one example of Elastic recommending this approach for general applications: https://www.elastic.co/blog/structured-logging-filebeat

We tend not to recommend that arbitrary JSON be put into ES with the default mappings for this very reason, type clashes.

The issue here isn't that different apps are writing different types to the same field which is definitely an issue that an Elastic Stack operator needs to deal with. The issue is Logstash and only Logstash is writing arrays with mixed types or changing data types for that field (switching back and forth between objects and keywords). If any other app did this, we'd ask them to fix the issue in the app. Components of the Elastic Stack should be held to just as high if not a higher standard on practicing what Elastic preaches around structured logs.

The reason is that logs can be very arbitrary, and we don't want to bloat our mappings with arbitrary keys in ES.

This is solved through using a schema like ECS. This is how Elasticsearch solved this problem in 8.x. It would make sense for Logstash and other products in the Elastic Stack to take the same approach. https://github.com/elastic/elasticsearch/issues/46119

This issue would be more appropriately tagged as a bug than as an enhancement.