elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
1k stars 414 forks source link

Question: HTTP Headers #232

Open MikePaquette opened 5 years ago

MikePaquette commented 5 years ago

Where would arbitrary HTTP headers for both requests and responses be located?

No. 15 of 16. This question was asked by a new ECS user, who is familiar with mapping IT events to data models and use cases in other schemas. These questions are being posted as a GitHub issue, because a) they may offer valuable insights. b) we expect that many new users will have similar questions.

vbohata commented 5 years ago

I am also looking for right place. For me I am goint to put an arbitrary headers under http.response.headers. and http.request.headers. fields. So for example http.response.headers.X-My-header.

graphaelli commented 5 years ago

APM stores headers in object fields with enabled: false under http.request.headers and http.response.headers. Interesting headers are extracted and written to the expected locations, User-Agent -> user.user_agent.original, Forwarded/X-Real-Ip/X-Forwarded-For -> client.ip.

jamiehynds commented 3 years ago

Noting that HTTP Headers has come up again on a recent enhancement request.

The user would like to see HTTP headers (used in HTTP requests and responses) in the ECS definition. The proposal would be keeping them under the http.request and http.response objects. Perhaps, http.request.headers and http.response.headers, where the "headers" could be of type object to include the different headers and values.

Use case: collecting logs from NGINX and other web servers and they would like to ship the request / response headers into Elasticsearch.

/cc @ebeahan @webmat

trentm commented 3 years ago

FWIW, the Node.js ecs-logging loggers include support for formatting HTTP request and response objects from various Node.js web frameworks. This includes the request and response headers (at http.request.headers and http.response.headers).

lukeelmers commented 3 years ago

We recently added ECS log metadata for http requests to Kibana's core logging system, and we opted to use http.request.headers and http.response.headers there as well.

ebeahan commented 3 years ago

Thanks for sharing these uses!

We clearly continue to see a need for http.[request|response].headers fields. It looks like, fortunately, we've been fairly aligned, but we should also work towards having the fields actually added to the schema with clear guidance on their usage.

@trentm @lukeelmers (and others 😄 ) you're welcome to submit an RFC proposal, even a one or two paragraphs strawperson, around HTTP headers in ECS. Having an RFC proposal open for discussion would be a good start toward formalizing a direction.

djptek commented 2 years ago

@lukeelmers

If I'm looking in the right place (sorry if not)

https://github.com/elastic/kibana/blob/main/src/core/server/http/router/headers.ts#L9

import { IncomingHttpHeaders } from 'http';

adds a large set of fields, see Interface IncomingHttpHeaders

Are there any specific fields that you'd like to consider as candidates for an RFC?

lukeelmers commented 2 years ago

Are there any specific fields that you'd like to consider as candidates for an RFC?

@djptek At the moment I don't think support for any specific headers is a large concern on the Kibana side, as these can vary by a number of factors anyway (e.g. browsers). The main request from us would be just adding a headers object to ECS in the first place. For logging purposes, we blindly hand those off without performing any direct manipulation, other than to strip values from sensitive headers like set-cookie and authorization.

djptek commented 2 years ago

Thanks @lukeelmers if we were to add a headers object to the schema, we'd need to specify one or more fields to include, wildcards aren't an option as they could prejudice other users. I'm suggesting the typescript import as a starting point in case there were to be a sensible subset of headers that we could import from the Typescript interface to IncomingHttpHeaders as to add all of these fields may risk bloating the ECS schema.

Alternatively, do you have an (anonymised) sample dataset that could be parsed to better understand the frequency distribution of transmitted header fields for this use case?

lukeelmers commented 2 years ago

@djptek Sorry to leave this one hanging -- unfortunately I don't think we have a good sample dataset to work with since these headers are not logged in Kibana by default (so we can't, for example, look at Cloud logs to help with this).

But to give you an idea, I did a quick sampling of the most common headers I'm seeing in the Kibana logs. Note that I am combining both incoming & outgoing headers here, and some like content-security-policy are browser-specific. I've also excluded any Kibana-specific headers.

lukeelmers commented 2 years ago

I should also add that, as mentioned earlier, we are just blindly passing the headers along to the Kibana logs, and they can be added dynamically by browsers or from an http route. So IMO it doesn't really matter what is chosen here; Kibana won't prevent custom headers from being created dynamically in the first place.

bplies-ATX commented 1 month ago

fwiw I'm interested in this.

@djptek Sorry to leave this one hanging -- unfortunately I don't think we have a good sample dataset to work with since these headers are not logged in Kibana by default (so we can't, for example, look at Cloud logs to help with this).

But to give you an idea, I did a quick sampling of the most common headers I'm seeing in the Kibana logs. Note that I am combining both incoming & outgoing headers here, and some like content-security-policy are browser-specific. I've also excluded any Kibana-specific headers.

  • accept
  • accept-encoding
  • accept-language
  • accept-ranges
  • authorization
  • cache-control
  • connection
  • content-length
  • content-security-policy
  • content-type
  • cookie
  • etag
  • host
  • if-none-match
  • origin
  • referer
  • referrer-policy
  • set-cookie
  • transfer-encoding
  • user-agent
  • vary
  • x-content-type-options *xf
  • x-forwarded-host
  • x-forwarded-port
  • x-forwarded-proto

There are certainly some very common headers names that are worth calling out. But headers can be completely arbitrary in reality and also the values can be unreliable or tampered with. So, for example, any IP-related headers are tempting to map to type ip but could be IPv4, IPv6, multiple, or garbage. Is this a case then for using the Flattened type?

This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings.