elastic / ecs

Elastic Common Schema
https://www.elastic.co/what-is/ecs
Apache License 2.0
995 stars 413 forks source link

New field observer.port #895

Open vbohata opened 4 years ago

vbohata commented 4 years ago

Summary

In case of reverse proxies there is an IP + port of the published web site / virtual server. There is currently only IP field for observer, port field is missing. Lets imagine there is a web server 1.2.3.4 on port 1443 running behind the reverse proxy. This is published to internet via the proxy 6.7.8.9 on port 443. There is no place to put the port 443 now.

Detailed Design:

I suggest to add a new field: observer.port

ebeahan commented 4 years ago

The observer.* fields are intended to describe the system that observes/creates the event but not the details about the source/destination of the network events themselves. Similar to the host.* field set where host.ip describes the IP addresses of the host device, observer.ip contains the IP addresses of the observing device.

While the frontend virtual server is a service listening on the proxy itself, I can see this as a use for the destination.nat.* and/or the source.nat.* fields to describe the network translation of the source/destination IP and ports in the event.

Using an example:

| user | => 6.7.8.9:443 | reverse proxy | => 1.2.3.4:1443 |server|

destination.ip: 6.7.8.9 destination.port: 443 destination.nat.ip: 1.2.3.4 destination.nat.port: 1443

What do you think @webmat @dainperkins ?

vbohata commented 4 years ago

Actually it is more complicated. In our very use case we want to store logs from BIG-IP. There are fields for client IP, port, virtual IP and port and server ip and port. There is also snat field if BIG-IP does nat translation. So we use client., server., client.nat.ip, observer.*.

A longer time ago when the observer was not part of the ECS we used destination.ip, destination.port for proxy virtual ip and port names. But we were advices to use observer.ip instead of destination.ip, so observer.port makes sense here.

dainperkins commented 4 years ago

I'm not 100% sure how f5 handles SNAT / vIPs, but iirc SNAT just plain old NAT, and vIPs are literally proxied connections (client-f5 sessions terminated on f5 ingress, f5- server sessions proxied on egress)

For me, the key is being able to track across the f5, ensuring e.g. netflow/fw connection logs on either side can be tracked back to the full connection (nat or proxy)

I would recommend using the actual network source destination across the various observation points up to and including the f5:

Network Segment Representation
Client Source.ip/port (Client), destination.ip.port (vip)
Client side network (e.g. Netflow) source.ip/port, destination.ip/port (vip)
f5 source.ip/port (real ip), source.nat.ip/port (snat) destination.ip/port (vip), destination.nat.ip/port (target server) with all in related.ip
Server side network (e.g. Netflow) source.ip/port (snat) destination.ip/port (server)
Server source.ip/port (snat) destination.ip/port (server)

Adding in a second community id (1 for natural, 1 for snat/vip) would also be really handy for stitching things together (possibly under related.hash, or think of adding a relatd.community_id, but the above should cover everything in the connections I think.). Let me know if I am missing anything...

If you are looking at logs re: configuring SNAT/VIP we would need to look at something else (maybe under rule)

vbohata commented 4 years ago

We are logging LTM part of F5 BIGIP. Both request and response. See https://techdocs.f5.com/kb/en-us/products/big-ip_ltm/manuals/product/bigip-external-monitoring-implementations-12-1-2/2.html

We use following IP fields:

We use the same F5 template for both request and response. As SNAT_IP is available in response log, we can not just say it is source or destination. Because it is in fact part of one request/response transaction, so source and destination fields are not good to use here. There is no fixed source/destination. Thats why we currently use following mapping:

In this case the observer fields matches the best, but there is only observer.ip in ECS. There is also F5 field VIRTUAL_PORT which we can not map to observer.port which does not exist.

dainperkins commented 4 years ago

I can see where the confusion lies, and potentially we may want to look at e.g. something along the lines of a vip field to make that distinction between nat, vip, proxy, and observer.

In the meantime do you have any example logs you could share that would better illustrate the issue (anonymized of course)?

The f5 page seems to indicate client/server/vip/snat info are all included in the logs, but if I am reading your message correctly there is no session level affinity that would tie a particular snat to a particular client ip - or a specific vip to a specific server - or possibly not all information is logged on each side (incoming to vip, outgoing from snat)?

ebeahan commented 4 years ago

I can see where the confusion lies, and potentially we may want to look at e.g. something along the lines of a vip field to make that distinction between nat, vip, proxy, and observer.

Agreed. There are several intermediary devices that fall into a somewhat gray area of ECS today: load balancers, web proxies, API gateways, web application firewalls, CASBs, and even some features of next-gen firewalls (which at times can be a combo of all of these 😂 ). The current observer.* fields do capture some common properties of these intermediary devices (e.g.ingress/egress interfaces and zones), but I still think there's a gap when a device acting in both a client and server capacity within the span of one event.

You can capture all the observer-owned IP addresses in observer.ip array (for F5 you could have the IP of the virtual server, self IPs, management IP, the SNAT'd IP). However, once you start adding ports from the observer, which port value do you populate? With F5 you could use SNAT_PORT or VIRTUAL_PORT, and I think you could argue both are correct. I don't think defining just an observer.port would capture the port's role clearly enough either. If observer.port was an array, by logging multiple values you'd lose the relationship between IP(s) and port.

The real-world examples and feedback like are very useful, so thank you and please keep sharing! 😄 I think there's definitely a good case for ECS to have a location(s) for this type of data, but we also need to think about how to address the overall challenge of modeling these types of intermediary events too.