Closed thomasklinger1234 closed 3 years ago
/cc @cherweg @LukasKucharski1887 @jonasstrippelmann1337
Sounds reasonable and quite easy to do, only thing to consider is potential abuse. Do you use only Java agents?
Sounds reasonable and quite easy to do, only thing to consider is potential abuse.
Could you be so kind as to explain the abuse potential? The security implications would also be interesting to us when we are instructing our clients to use this feature.
Do you use only Java agents?
Mostly Java. Which is why I opened the feature request here at first. We also have go, node.js and Python agents running.
Could you be so kind as to explain the abuse potential? The security implications would also be interesting to us when we are instructing our clients to use this feature.
I can't see such, but it's the one thing that comes in mind that worth consideration.
Mostly Java. Which is why I opened the feature request here at first. We also have go, node.js and Python agents running.
Right, in which case we may consider moving it to the https://github.com/elastic/apm repo. @felixbarny WDYT?
It seems useful to me, agree that it should be something that is specified consistently across agents. I think it is fine, though, if not all agents implement that feature.
@thomasklinger1234 is it important that the identifier is configurable? Or is it enough for the agent to just add the service name?
Or do you need to be able to identify a particular instance of an agent? If yes, what would be the best identifier for that? The hostname? The service.node.name? The ephemeral id (UUID) of the agent?
Maybe a good default could be elasticapm-$language/$agent.version $service.name/$service.version
.
It seems useful to me, agree that it should be something that is specified consistently across agents. I think it is fine, though, if not all agents implement that feature.
@thomasklinger1234 is it important that the identifier is configurable? Or is it enough for the agent to just add the service name?
Or do you need to be able to identify a particular instance of an agent? If yes, what would be the best identifier for that? The hostname? The service.node.name? The ephemeral id (UUID) of the agent?
@felixbarny For starters, I would be totally happy if the service name or any "human-readable" identifier would be sent that tells us what client sends the data. For our particular use case, the hostname or the UUID would not be useful because reverse-engineering that without knowlege about our client's environment is infeasible.
Your suggestion elasticapm-$language/$agent.version $service.name/$service.version
sounds awesome!
Hey @felixbarny, any news on this? I may be also able to implement this and create a PR if needed :)
Hey @thomasklinger1234, no news here. The next step would be to create a cross-agent spec for that, to make the header consistent across all agents.
This is the file that needs to be modified to describe the new behavior: https://github.com/elastic/apm/blob/master/specs/agents/transport.md
Also, there were some discussions in the past about the user-agent: https://github.com/elastic/apm/issues/88
Hi @thomasklinger1234 !
Since there were no alternatives proposed, I wonder what you think about following approach. Today when transaction, span, error or metric documents are going through APM server, we already save several fields which could be useful for your use case.
service.name
service.version
agent.name
agent.version
observer.name
(APM server instance)observer.host
(APM server host)And several others, like host.ip
or container.id
, etc.
This could allow putting together dashboards to analyze which services are sending more transactions by volume as well as analyzing individual load by various APM servers (in case you provision separate APM server for separate teams).
Below is a sample dashboard I created using Kibana Lens.
What do you think?
Hi @thomasklinger1234 ,
I've just opened a PR to implement the shared specification on all agents : https://github.com/elastic/apm/pull/514
Once this is properly reviewed and merged, we can start doing the implementation on the Java agent side.
@SylvainJuge @alex-fedotyev thanks for your efforts, I have been busy the last time and could not respond so sorry for that. Awesome work from both of you, that looks very helpful for us. Excited to see the PR being merged and the implementation as well :)
Just mentioning a couple possible concerns. I don't think we need to deal with these in the spec, at least not right now.
Need we guard against non-ascii chars in service.version
? The APM intake spec requires that service.name matches ^[a-zA-Z0-9 _-]+$
, so that is fine. The spec says nothing about the allowed range of chars in service.version
. Technically that could make it possible for a weird service.version to break the RFC spec for User-Agent
: https://datatracker.ietf.org/doc/html/rfc7231#appendix-D My poor read of the ABNF production User-Agent = product *( RWS ( product / comment ) )
is that things only get problematic if service.version
includes parentheses, \
, and likely gets "interesting" if it includes non-ASCII characters. I suppose we are also relying on header encoding in each of the languages to guard against newline characters in service.version
playing games with header injection attacks.
Another potential concern is length. Both service.name
and service.version
can be up to 1024 chars in length. That could potentially have impact on the size of the header block in requests.
Yah, I will probably do some sanitization of serviceVersion
for the Node.js APM agent. With:
const apm = require('.').start({
serviceName: 'foo',
serviceVersion: 'bar\nblah'
})
apm.startTransaction('t')
apm.endTransaction()
I get this error from Node's http lib when attempting the intake request:
% node foo.js
node:_http_outgoing:570
throw new ERR_INVALID_CHAR('header content', name);
^
TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["User-Agent"]
at ClientRequest.setHeader (node:_http_outgoing:579:3)
at new ClientRequest (node:_http_client:256:14)
at request (node:http:96:10)
at Client.get [as _transportGet] (node:http:107:15)
at Client._pollConfig (/Users/trentm/el/apm-nodejs-http-client2/index.js:307:20)
at new Client (/Users/trentm/el/apm-nodejs-http-client2/index.js:164:10)
at Config.httpTransport [as transport] (/Users/trentm/el/apm-agent-nodejs13/lib/config.js:282:25)
at Agent.start (/Users/trentm/el/apm-agent-nodejs13/lib/agent.js:256:32)
at Object.<anonymous> (/Users/trentm/el/apm-agent-nodejs13/foo.js:1:26)
at Module._compile (node:internal/modules/cjs/loader:1101:14) {
code: 'ERR_INVALID_CHAR'
}
Is your feature request related to a problem?
We are providing APM servers for multiple teams as a service internally. When analyzing the request logs from our APM servers, we cannot identify clients/agents easily to troubleshoot the individual APM agent. We only get two informations:
which are hard to use when performing analytics without access to the IP ranges.
It would be super helpful to optionally be able to customize the user agent header that the agent is sending (currently
elasticapm-java/1.23.0
or similar).Describe the solution you'd like
Add a new optional property to enable customization of the user agent. When sending data to APM server, the agent will now use the header
elasticapm-java/1.23.0/{my-identifier}
. The optional part could be either inferred automatically, e.g. by using an existing property likeservice_name
or be explicitly set in the configuration file with the propertyreporter_user_agent
.Describe alternatives you've considered
None.
Additional context
Here is a screenshot of our analytics tools that we use for getting insights into the requests that we receive at the APM server. As you can see, we can only distinguish on the IP range which is not useful as our clients use NAT gateways and other proxies which makes the IP not helpful.