elastic / integrations

Elastic Integrations
https://www.elastic.co/integrations
Other
198 stars 427 forks source link

[Apache Tomcat - Access Log] - Fix ingest pipeline tests & Update X-forwared-for IP to correct destination #7622

Closed muthu-mps closed 11 months ago

muthu-mps commented 1 year ago

Observations

Screenshot 2023-08-31 at 3 00 53 PM
harnish-elastic commented 1 year ago

We should add the actual x-forwarded-for IP address to generate the sample events for pipeline tests. If this is going to have a list of IP addresses then the test should cover that scenario as well.

I have checked the sample logs and found that the value for X-Forwarded-For is missing. I will create a PR to fix the pipeline test!

As per the document we are supporting 3 different log formats. Can we add all those sample formats in the pipeline tests ?

I have checked that all three different log formats are already present in the single test-access.log file.

As the x-forwarded-ip is captured in the header_forwarder field. @lalit-satapathy - As these fields are common header fields can we try to leverage the use of ECS fields here? In IIS the x-forwarded-ip is mapped to network-forwarded-ip. Either we can map to the same field or we can have a common strategy.

@muthu-mps We can replace the header_forwarder field to ECS field network-forwarded-ip. But there could be a possibility that after those changes, an existing user might need to perform reindexing since the header_forwarder field will be replaced by the network-forwarded-ip. Also if we keep header_forwarder field and add the network-forwarded-ip ECS field. There could be a possibility of data redundancy for both fields.

Let me know your thoughts on this!

muthu-mps commented 1 year ago

@ruflin - I would like to hear your thoughts on this,

ruflin commented 1 year ago

@muthu-mps I think I'm currently "to far away" from the specifics here. @tommyers-elastic @lalit-satapathy Could you chime in or help find the right people for it? @AlexanderWert Seems partially like a semconv discussion.

tommyers-elastic commented 1 year ago

@muthu-mps according to this doc, the appropriate semconv field is http.request.header.<name>.

this issue also suggests that the appropriate field to align with the rest of the ECS http fields is http.request.headers.*.

otherwise an appropriate mapping would probably be client.ip. the description of network.forwarded_ip suggests that having source.address/ip is a prerequisite for the field making sense.

lalit-satapathy commented 1 year ago

Since this is a field not mapped to ECS earlier and we are considering to map to ECS, we can go ahead and map to a closest ECS field. I am fine with either of the two options below (both options seems to be making an equal case).

otherwise an appropriate mapping would probably be client.ip. the description of network.forwarded_ip suggests that having source.address/ip is a prerequisite for the field making sense.

@tommyers-elastic, Does the corresponding semconv field need to be considered now for such a case?

tommyers-elastic commented 1 year ago

i don't see any reason to write the request.header(s).* field if we are capturing the IP elsewhere.

i'm not sure on the status of our internal 'support' for semconv fields. for existing integrations which are not compatible anyway it probably doesn't matter. if our policy is to try to be compatible, then we could use client.address alongside client.ip.

muthu-mps commented 11 months ago

@harnish-elastic - Can we use the client.ip field instead of header_forwarder to capture the forwarded-ip?

harnish-elastic commented 11 months ago

@harnish-elastic - Can we use the client.ip field instead of header_forwarder to capture the forwarded-ip?

@muthu-mps We can replace the header_forwarder field with the ECS field client.ip. But there could be a possibility that after those changes, an existing user might need to perform reindexing/update_query since the header_forwarder field will be replaced by the client.ip. Also if we keep the header_forwarder field and add the client.ip ECS field. There could be a possibility of data redundancy for both fields.

Currently, exploring the update_query by upgrading the package scenario to rename the field from header_forwarder to client.ip so that existing users can also rename the existing data fields. If the update query works fine, we will add the description in the README itself by giving the steps that the user can perform.