Open sorenlouv opened 1 year ago
Seems like there's an option that is at least a little bit more convenient compared to processors:
inputs:
- id: custom-logs-1684919299044
+ fields_under_root: true
+ fields:
+ service:
+ name: My Service
type: logfile
data_stream:
namespace: default
streams:
- id: logs-onboarding-example
data_stream:
dataset: example
paths:
- /path/to/file
However, I'm not entirely sure if that's even possible with Elastic Agent. I got that from the Filebeat docs.
Do we only want to make this change for the Filebeat inputs or log inputs specifically? If so then this is really a change in Beats. Elastic Agent just passes the relevant input section of the policy to Filebeat for log input types without looking at any part of it other than the type
field.
We could support something like service.name: "My Service"
as proposed in the original description in the Filebeat input configuration as syntactic sugar for creating the needed add_fields
processor. It would not be much more complicated than https://github.com/elastic/beats/pull/35287. The Beats already set service.name
to the name of the Beat, but we can just have this override that. Here's an example log
{"log.level":"error","@timestamp":"2023-05-22T00:30:04.222Z","message":"Error dialing x509: certificate signed by unknown authority","component":{"binary":"filebeat","dataset":"elastic_agent.filebeat","id":"filestream-default","type":"filestream"},"log":{"source":"filestream-default"},"network":"tcp","ecs.version":"1.6.0","log.logger":"esclientleg","log.origin":{"file.line":38,"file.name":"transport/logging.go"},"service.name":"filebeat","address":"metrics.es.our.ece.domain.edu.au:9243","ecs.version":"1.6.0"}\
If we want to support this for all Beat input types, it is more work but probably still not unreasonable. If we want to support this for all Elastic Agent inputs beyond just Beats than there is much more work to drive everything to a consistent implementation.
In the example above, Filebeat would be sent this entire section of the agent policy (plus the output section that applies) and Filebeat would generate a Beat configuration from it:
id: custom-logs-1684919299044
service:
name: My Service
type: logfile
data_stream:
namespace: default
streams:
- id: logs-onboarding-example
data_stream:
dataset: example
paths:
- /path/to/file
The Beats already set service.name to the name of the Beat, but we can just have this override that. Here's an example log
Having service.name
for its own Beats log seems correct. But it should not be set on the data sent. Is this the case?
Taking https://github.com/elastic/elastic-agent/issues/2416 into account, the final config would look as following:
inputs:
- type: logfile
service.name: "foo"
# Should this be set automatically if `service.name` foo?
data_stream.dataset: "foo"
paths:
- /var/log/my-file/my.log*
Ideally, service.name
would be supported on all inputs, including the ones in Metricbeat. But we can start with log. I need to check in detail if https://github.com/elastic/beats/pull/35287 would also work for this as in https://github.com/elastic/beats/pull/35287 the target index is not modified and here we would have to modify data. Hopefully there is a place to hook into it.
Having service.name for its own Beats log seems correct. But it should not be set on the data sent. Is this the case?
Correct service.name
is not added automatically to the non-elastic_agent datastreams, for example is it omitted on logs-system.syslog-*
. I am too used to looking at agent logs all day so that's the first example I looked at.
Ideally, service.name would be supported on all inputs, including the ones in Metricbeat.
All we should need to do is automatically create an add_fields
processor for each input that specifies service.name
in its input configuration. There are a few examples of how to create a processor like this if you need one.
There are a few examples of how to create a processor like this if you need one.
If you have a link to one, that would be helpful.
We could support something like service.name: "My Service" as proposed in the original description in the Filebeat input configuration as syntactic sugar for creating the needed add_fields processor.
That sounds great!
If you have a link to one, that would be helpful.
There is an existing add_data_stream
processor for adding the datastream type, dataset, and namespace fields to an event you can use as a reference.
Here is one example usage of it: https://github.com/elastic/beats/blob/fb25982c80fb68745cff05a6a6a07a5c1e1ab4e7/x-pack/osquerybeat/internal/pub/publisher.go#L95-L119
The processor implementation itself is in https://github.com/elastic/beats/blob/fb25982c80fb68745cff05a6a6a07a5c1e1ab4e7/libbeat/processors/add_data_stream/add_data_stream.go#L68
@cmacknz who should own this issue from an implementation perspective? we have all the processors required in this case.
This one is fairly straight forward, I don't think there's much specialized knowledge required.
Either of the Elastic Agent teams could certainly do it, but anyone capable of working in the Beats repository could take care of this.
This one is fairly straight forward, I don't think there's much specialized knowledge required.
This is great to hear! The primary goal with this issue is increase the number of clusters where logs are annotated with service.name
. This means reducing the technical barriers to setting service.name
and highlighting this capability in documentation, guides and onboarding flows.
The concept of "services" was originally introduced in APM but should not be limited to this domain. The plan is to make logs "service-aware" (where applicable), and thus make it easier to investigate logs per service, and correlated with traces in APM.
To do this, logs must be annotated with
service.name
. Currently, logs are ingested without any reference to services, and users have to manually add theadd_fields
processor or similar to annotate their logs.The objective with this issue, is to make it as easy as possible for customers to annotate their logs with
service.name
. Instead of requiring them to use processors, they should simply be able to specify theservice.name
in their logs configuration:What would be the next steps for making the necessary changes to Elastic Agent?