elastic / beats

:tropical_fish: Beats - Lightweight shippers for Elasticsearch & Logstash
https://www.elastic.co/products/beats
Other
12.13k stars 4.91k forks source link

[Filebeat] user_agent parsing error while ingesting web logs with filebeat 6.7.0 into elasticsearch 7.0.0 #10650

Closed weltenwort closed 5 years ago

weltenwort commented 5 years ago

Versions:

Operating System: Linux 4.20.6-arch1-1-ARCH elastic/beats#1 SMP PREEMPT Thu Jan 31 08:22:01 UTC 2019 x86_64 GNU/Linux

Description:

When indexing the filebeat test data from the beats 6.7 branch into a 7.0.0-SNAPSHOT elasticsearch cluster, the access logs for the web servers (at least nginx, iis and traefik) fail to be indexed with errors messages akin to the following:

info [o.e.a.b.TransportShardBulkAction] [${HOSTNAME}] [filebeat-6.7.0-2019.02.08][1] failed to execute bulk item (index) index {[filebeat-6.7.0-2019.02.08][_doc][-v9vzWgBSKfxSV4q4CHr], source[{"offset":1204,"log":{"file":{"path":"${SOMEDIR}/beats/filebeat/module/iis/access/test/test.log"}},"prospector":{"type":"log"},"read_timestamp":"2019-02-08T14:08:07.032Z","source":"${SOMEDIR}/beats/filebeat/module/iis/access/test/test.log","fileset":{"module":"iis","name":"access"},"error":{"message":"field [iis.access.user_agent.original] already exists"},"input":{"type":"log"},"iis":{"access":{"server_name":"MACHINE-NAME","agent":"Mozilla/5.0+(Windows+NT+6.1;+Win64;+x64;+rv:57.0)+Gecko/20100101+Firefox/57.0","response_code":"200","cookie":"-","method":"GET","sub_status":"0","user_name":"-","http_version":"1.1","url":"/","site_name":"W3SVC1","referrer":"-","body_received":{"bytes":"456"},"hostname":"example.com","remote_ip":"85.181.35.98","port":"80","server_ip":"127.0.0.1","body_sent":{"bytes":"123"},"win32_status":"0","request_time_ms":"789","query_string":"-","user_agent":{"original":"Mozilla/5.0+(Windows+NT+6.1;+Win64;+x64;+rv:57.0)+Gecko/20100101+Firefox/57.0","os":{"name":"Windows"},"name":"Firefox","device":{"name":"Other"},"version":"57.0"}}},"@timestamp":"2018-01-01T10:11:12.000Z","beat":{"hostname":"${HOSTNAME}","name":"${HOSTNAME}","version":"6.7.0"},"host":{"os":{"build":"rolling","name":"Arch Linux","family":"","version":"","platform":"arch"},"containerized":false,"name":"${HOSTNAME}","id":"${HOSTID}","architecture":"x86_64"},"event":{"dataset":"iis.access"}}]}
   │      org.elasticsearch.index.mapper.MapperParsingException: failed to parse field [iis.access.user_agent.os] of type [keyword] in document with id '-v9vzWgBSKfxSV4q4CHr'
   |      ...SNIP...
   │      Caused by: java.lang.IllegalStateException: Can't get text on a START_OBJECT at 1:419

I would suspect that the user_agent.original field, which is already populated by user_agent ingest processor in elasticsearch 7.0.0, causes the rename operation in the version 6.7.0 pipeline to fail.

I haven't tested all of them, but this probably happens for all filebeat web server modules that use the user_agent processor in the pipeline.

Steps to Reproduce:

  1. Start an elasticsearch 7.0.0 SNAPSHOT
  2. Configure filebeat to connect to the elasticsearch 7.0.0 cluster
  3. Enable the web server modules such as nginx or iis
  4. Change the module configuration to point to the corresponding filebeat test log samples from the 6.7 branch of the beats repo
  5. Start filebeat
  6. Observe the filebeat and elasticsearch logs
ruflin commented 5 years ago

I think the problem here is that the user_agent processor changed the format between 6.x and 7.x to align with ECS. The problem we have now is that the data created by Filebeat 6.7 with Elasticsearch 6.7 or 7.0 is not identical and even conflicts.

Options:

  1. Introduce ecs: false in the ingest processor in 6.x. This would required that ecs: false is still supported by Elasticsearch which is not the case.
  2. Use ecs: true in 6.x to already generate ECS data. This would be a breaking change in 6.7
  3. Fix issue with user_agent.original by checking if the field already exist. Document the fact that when Elasticsearch is upgraded to 7.0, Filebeat will start to generate different data structure for the user_agent.
  4. Have painless scripts in place that when running Filebeat 6.x against Elasticsearch 7.x, still the same data structure is generated. This would mean quite a bit of complexity in the ingest processor if even possible.

Introducing an ecs: * config in Option 1 and 2 means breaking compatibility with Elasticsearch versions older then 6.6 as the ingest processor checks if there are config options that should not be there and rejects ecs:* configs.

My current suggestion is that we go with option 3 and make users aware that when upgrading Elasticsearch, the structure of the data will slightly change. We must ensure on our side, that it's not conflicting with previous data. This could also have an affect on some dashboards (needs verification).

Option 1 would be the most seamless one from a user perspective but it would require Elasticsearch to keep the old ingest processor around for all of 7.x.

ruflin commented 5 years ago

Trying to implement Option 3 I stumbled over more issues:

I will now play around with option 4.

We have also the problem the other way around with Filebeat 7.x sending data to 6.x: https://github.com/elastic/beats/issues/10655

ruflin commented 5 years ago

Here is a first attempt to solve this with renaming of fields for apache.access logs: https://github.com/elastic/beats/pull/10661

@jakelandis We should also discuss to keep the ecs flag around in 7.x as the approach above seems pretty unstable.

bleskes commented 5 years ago

@ruflin, if I understand correctly, the user agent processor in 7.0 currently only supports the JSON format that beats 7.0 ship and breaks on 6.7 data. That's a no go from our upgrade perspective (upgrade ES first, then Kibana, the data shippers), so we indeed need to fix this. Also, this means that the 6.7 structure is really different and that we can detect it and do something else in the user agent ingest processor to support the 6.7 formats. This is basically what you mean with option 3, right? (apologies but I'm not familiar with the details of the specific field you mentioned). If so I'm +1 on that direction. Also, ideally, the 7.0 ingest processor will produce ECS compatible documents, even if it starts with the 6.7 format.

ruflin commented 5 years ago

As an example the 6.7 user_agent processor creates the field device. The 7.0 processor creates the field device.name. This means we have a keyword field conflicting with an object field. There are more fields with the same problem for example in os.*.

The Filebeat indices are versioned per Beat version. Upgrading Elasticsearch to 7.0 will mean the Beat still ingests to the same index and the type cannot change.

Proposal 3 has become obsolete and is now the same as option 4 because there are more fields then just .original (see notes above). This leaves us with 2 options:

  1. Filebeat ingest processor "detects" that 7.x user_agent processor was used and converts the data to be compatible with 6.x
  2. Elasticsearch 7.x has the same user_agent processor as in 6.7 but ecs is set to true instead of false by default. It could still be deprecated.

The outcome on the data structure side is very similar. I would also like to discuss option 2. I could see this helping also other users upgrading. The main downside is that we have leftover code in ES 7.

jakelandis commented 5 years ago

The Filebeat indices are versioned per Beat version

Is it possible for Filebeat 6.x to detect it's running against a 7.x cluster and use a slightly different index name to avoid mapping errors ?

My concern with option 2 is that there is no motivation to start using the ecs version other then a mildly annoying deprecation warning. If we went with option 2, when 8.0.0 comes out, would we have this same conversation w.r.t removing the deprecated flag ?

bleskes commented 5 years ago

@jakelandis when we have 8.0 come out we could remove the flag as it has been deprecated and all the beats that are eligible to speak to 8.0 will not be setting it at all.

jakelandis commented 5 years ago

@bleskes Do all 6.x versions of Beats need to talk with 7.0 ES ? (or just Beats 6.7 ?)

jakelandis commented 5 years ago

Spoke with @jasontedor and cleared up a few things in person. We will add the flag and functionality to 7.0.0 (deprecated), and leave it removed for 8.0.0.

jakelandis commented 5 years ago

@ruflin - If I change the default value to true in 7.0 will that still work for you ?

So it would be: 6.7 : default false 7.x : default true 8.0 : gone (with true behavior)

ruflin commented 5 years ago

@jakelandis Perfect, this is exactly as expected.

ruflin commented 5 years ago

https://github.com/elastic/beats/pull/10688 was merged and https://github.com/elastic/elasticsearch/pull/38757 seems to be almost ready. I tested to two together and seems to work as expected. Will keep this issue open until also https://github.com/elastic/elasticsearch/pull/38757 is merged.

jakelandis commented 5 years ago

https://github.com/elastic/elasticsearch/pull/38757 has been merged into 7.0 branch and will make the 7.0.0-rc1 release. It will soon be merged to the 7.x branch for inclusion in 7.1.

bleskes commented 5 years ago

@jakelandis many 🙏

ruflin commented 5 years ago

Closing as all related PR's were merged.