Open gregkalapos opened 2 months ago
Other similar case is http.request.headers
and http.response.headers
already existing in the traces-apm
mapping:
"http.request.headers": {
"path_match": "http.request.headers.*",
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
},
{
"http.response.headers": {
"path_match": "http.response.headers.*",
"match_mapping_type": "string",
"mapping": {
"type": "keyword"
}
}
Other similar case is http.request.headers and http.response.headers already existing in the traces-apm mapping:
That's interesting. I wasn't aware that we were adding all HTTP headers to the mapping. @axw did you hear of situations where this lead to mapping explosions? It seems a bit dangerous to me as anyone can just create a bunch of requests with unique HTTP headers and force the backend into a field explosion.
- We could use flattened field type
This would also work in case subobjects
is set to false
- We could set enabled to false.
This doesn't work when subobjects
is set to false
, because we can't add an object
mapper for http.request.header
with enabled: false
to the mapping in a context where objects are disabled.
That's interesting. I wasn't aware that we were adding all HTTP headers to the mapping. @axw did you hear of situations where this lead to mapping explosions?
I haven't.
It seems a bit dangerous to me as anyone can just create a bunch of requests with unique HTTP headers and force the backend into a field explosion.
That's a good point, I don't think anyone considered this.
@felixbarny should I bring into next semconv meeting your thoughts about field explosion? Or if you want you can create issue here
I'm not sure if this is an issue with semantic conventions per-se. It depends on how backends can deal with namespaces that don't have a bounded number of fields. Some may be more resilient than others when it comes to the total number of fields. For example, some vendors may only store fields of certain namespaces for retrieval but don't maintain in-memory metadata about them. What we can do when it comes to mapping http.request.headers.*
is to store this as a flattened
field type, which avoids creating a field in the mapping for each field, so that there's no risk of a mapping explosion. That field type does come with certain tradeoffs but they seem reasonable in this case.
Still, I think it would be interesting to get some insight into how other backends intend to deal with these multi-key fields. So if you could bring that up in the next semconv meeting, that would be highly appreciated.
Summary
(Not sure if
multi-key fields
is the right term; if there is a better short name describing it, let's update the title.)OTel SemConv defines fields which can have multiple keys - examples are:
http.request.header.<key>
, this is alreadystable
db.query.parameter.<key>
There are 2 aspects of this: 1) When the OTel SemConv <-> ECS merge happens, how do such fields get into ECS? 2) How should the mapping look like for such fields for Elasticsearch?
We discussed this with @felixbarny shortly, regarding point 2:
flattened
field typeenabled
to false.labels
with similar dynamic keys, currently with this mapping:Issue with above is that this leads to field explosion.