TritonDataCenter / rfd

Requests for Discussion
Mozilla Public License 2.0
258 stars 79 forks source link

RFD 163 Cloud Firewall Logging discussion #125

Open danmcd opened 5 years ago

danmcd commented 5 years ago

This is a generic issue for RFD 163 discussion.

danmcd commented 5 years ago

The ipmon(1M) command will produce output IFF a given netstack's ipfilter if logging is enabled. The -l flag to ipf(1M) enables logging for rules.

Sample output from ipmon, where I enabled things with ipf -l pass in a CN's global zone looks like this:

06/02/2019 19:17:05.867915 external0 @-1:-1 p 172.24.4.2,65111 -> 172.24.4.133,22 PR tcp len 20 88 -AP IN
06/02/2019 19:17:05.868010 external0 @-1:-1 p 172.24.4.133,22 -> 172.24.4.2,65111 PR tcp len 20 88 -AP OUT
06/02/2019 19:17:05.868269 external0 @-1:-1 p 172.24.4.2,65111 -> 172.24.4.133,22 PR tcp len 20 52 -A IN
06/02/2019 19:17:06.227940 external0 @-1:-1 p 172.24.4.2,65111 -> 172.24.4.133,22 PR tcp len 20 88 -AP IN
06/02/2019 19:17:06.228099 external0 @-1:-1 p 172.24.4.133,22 -> 172.24.4.2,65111 PR tcp len 20 88 -AP OUT
06/02/2019 19:17:06.228402 external0 @-1:-1 p 172.24.4.2,65111 -> 172.24.4.133,22 PR tcp len 20 52 -A IN
06/02/2019 19:17:06.297805 igb1 @-1:-1 p 192.168.4.38,48409 -> 192.168.4.19,80 PR tcp len 20 60 -S OUT
06/02/2019 19:17:06.297922 igb1 @-1:-1 p 192.168.4.19,80 -> 192.168.4.38,48409 PR tcp len 20 60 -AS IN
06/02/2019 19:17:06.297944 igb1 @-1:-1 p 192.168.4.38,48409 -> 192.168.4.19,80 PR tcp len 20 52 -A OUT
06/02/2019 19:17:06.298104 igb1 @-1:-1 p 192.168.4.38,48409 -> 192.168.4.19,80 PR tcp len 20 429 -AP OUT

The ipmon(1M) man page describes this output. Note the high-ish resolution timestamp, the rules (in this case -1, which I believe equals "no rule") the ports & protocol, as well as a bit of protocol-specific information.

If we do not use ipmon directly, we should use what it employs, messages from /dev/ipl, to feed what we're building here. Further complicating things is the wrinkle of global-zone-imposed ipf (using -G from global zone) vs. in-netstack ipf (using -z from global, or in-zone use of ipf).

askfongjojo commented 5 years ago

I have a couple questions:

  1. Would reply traffic be excluded from the logging? (it will be unlikely of interest)
  2. Would we consider having certain instance-level backoff mechanism to prevent a massive attack on a single instance from overwhelming the logging system, and causing the log entries for other instances to be dropped?
  3. Would AuditAPI be deployed in a HA fashion, so that it can be scaled out as the workload increases?
mgerdts commented 5 years ago

@askfongjojo said:

I have a couple questions:

  1. Would reply traffic be excluded from the logging? (it will be unlikely of interest)

In the initial discussion it was proposed that a new TCP connection would be identified by having the SYN flag, which is only present as a TCP connection is being opened. If we only check for the SYN flag (ignoring ACK), each connection open will result in two packets being logged.

In the three-way handshake both participants send a packet with SYN. Since a SYN on its own does not mean that the connection will be established, it probably makes more sense to log when a SYN-ACK is seen, as that means both sides are at least half-open.

How this will work for UDP is a bit of a mystery to me. It seems we will be forced to keep some state in ipf or cfwlogd so that we know who initiated a conversation. Otherwise, we will consider both ends to be initiating conversations with each request and reply. Since there is no actual connection to tear down there will be no teardown phase and the only way to free up the associated memory will be as a result of inactivity.

  1. Would we consider having certain instance-level backoff mechanism to prevent a massive attack on a single instance from overwhelming the logging system, and causing the log entries for other instances to be dropped?

We probably need something like that. It's not clear whether this is needed for MVP.

  1. Would AuditAPI be deployed in a HA fashion, so that it can be scaled out as the workload increases?

For MVP, I'm hoping to avoid this. It is quite likely it will be needed and the architecture should accommodate it.

mgerdts commented 5 years ago

RFC 1122 Section 4.2.2.13 discusses closing a connection with a 4-way handshake or by sending a RST. This means that we can't just use presence of RST in a packet to indicate that a connection was refused.

kusor commented 5 years ago

AuditAPI

One or more zones on the admin network

All for multiple, it'll be definitely easier to address the problem from the earlier design phase than trying to apply a patch later - which may never happen.

If multiple How does the log daemon know which one to log to?

Just using DNS should do the trick here, as far as we add Audit API to Binder the same way we do with every core service.

How are logs coalesced so that instance A does not clobber instance B's logs?

It's a good question: I'd say millisecond precision, together with rule uuid, vm uuid and account uuid are elements enough to provide some uniqueness. I'd suggest to use a similar approach than the one used by workflow to avoid, for example, two different runners trying to execute two different machine jobs into the same machine at once: lock by target vm uuid and fw rule.

Another, possibly better option, put together a general purpose change feed consumer which could be used by every service going HA. Of course, this one might be little bit out of scope

trentm commented 5 years ago

Reviewing https://github.com/joyent/rfd/blob/master/rfd/0163/README.md#log-archiver-service

This service will have a core VM logarchive0 that will run a hermes master and a hermes proxy.

nit: logarchiver0 for the VM alias.

This service will be responsible for creating an SMF service, svc:/system/smartdc/logarchiver:default, that will run the hermes actor.

naming nit: All the other agents have this FMRI: svc:/smartdc/agent/$name:default. E.g.:

[root@headnode (nightly-1) ~]# svcs | grep agent
online         Mar_02   svc:/smartdc/agent/firewaller-agent-setup:default
online         Mar_02   svc:/smartdc/agent/cmon-agent-setup:default
online         Mar_02   svc:/smartdc/agent/cn-agent-setup:default
online         Mar_02   svc:/smartdc/agent/vm-agent-setup:default
online         Mar_02   svc:/smartdc/agent/net-agent-setup:default
online         Mar_02   svc:/smartdc/agent/ur:default
online         Mar_02   svc:/smartdc/agent/cn-agent:default
online         Mar_02   svc:/smartdc/agent/vm-agent:default
online         Mar_02   svc:/smartdc/agent/hagfish-watcher:default
online         Mar_02   svc:/smartdc/agent/firewaller:default
online         Mar_02   svc:/smartdc/agent/amon-zoneevents:default
online         Mar_02   svc:/smartdc/agent/smartlogin:default
online         Mar_02   svc:/smartdc/agent/amon-relay:default
online         Mar_02   svc:/smartdc/application/config-agent:default
online         Mar_02   svc:/smartdc/agent/amon-agent:default
online         Mar_02   svc:/smartdc/agent/net-agent:default
online         18:34:18 svc:/smartdc/agent/cmon-agent:default

So let's call this one: svc:/smartdc/agent/logarchiver-agent:default for starters (per the cn-agent, vm-agent, etc. pattern).

Hermes will be configured to collect

s/Hermes/Logarchiver-agent/

/var/log/firewall/...

Perhaps it is worth noting in the docs/RFD that this is separate from the existing /var/log/fw/... that holds the logs for the global zone "firewaller" agent.

/:customer_login/reports/firewall-logs/:vm_uuid/:year/:month/:day/:iso8601stamp.json.gz

Perhaps it is discussed somewhere else, but is there a reason for vm_uuid at this position rather than any of:

  1. /:customer_login/reports/firewall-logs/:year/:month/:day/:vm_uuid/:iso8601stamp.json.gz
  2. /:customer_login/reports/firewall-logs/:year/:month/:day/:hour/:vm_uuid.log.gz

Some thoughts:

.../:vm_uuid/...

Is there potential contention with VM migration, i.e. if one gets firewall logs for the same VM UUID, but different CNs during the same hour? Yuck. It would be bit of a shame to have to add some other indicator on the path just for this rare case.

...json.gz

Why the ".json" instead of ".log"? It isn't strictly valid JSON. I understand what you mean though, so I don't have strong opinion.

jclulow commented 5 years ago

I took a look at the Log Archiver Service section.

The UUID translation facility as described seems idiomatic with the date translation stuff that's already there, which is great.

Overall, I don't think you need as many phases in the project. The scaling problem is relatively simple to solve: the only bottleneck of which I'm aware is the proxy service which is totally stateless. You can just spin up more processes -- we've even done this before as a kind of hotpatch in the past.

Even if you decide to go as far as totally replacing the entire proxy component with something else like Squid that might allow better vertical scale, though that would be a lot more work if done properly, you won't need to add mechanism to the master to sign URLs. The proxy is a straight TCP forwarder; the actors themselves are full Manta clients which are already handed appropriate credentials by the master.

I'd be inclined to just do the scaling (and customer UUID mapping) work for Hermes straight up, and add the firewall logging service to the existing logset configuration using the existing Hermes instance. It seems like the shortest path to both solving the existing Hermes scale problems and meeting the new firewall logging needs.

mgerdts commented 5 years ago

Replying to @trentm

All changes accepted as suggested unless noted below.

/:customer_login/reports/firewall-logs/:vm_uuid/:year/:month/:day/:iso8601stamp.json.gz

Perhaps it is discussed somewhere else, but is there a reason for vm_uuid at this position rather than any of:

  1. /:customer_login/reports/firewall-logs/:year/:month/:day/:vm_uuid/:iso8601stamp.json.gz
  2. /:customer_login/reports/firewall-logs/:year/:month/:day/:hour/:vm_uuid.log.gz

I think this would trigger a "hot shard" problem that @dekobon was concerned about. I'm not sure how many VMs per customer it would take to make that a real concern.

Some thoughts:

  • :iso8601stamp in the file name or path means that one always needs to do an mfind to get the file. The #2 format allows one to know the full path given the VM UUID and the hour.

As currently specified a restart of cfwlogd would result in multiple logs in a single rotation period (hour). This is because it is emitting compressed data. If a previous writer to a log file did not properly flush and close the file, it could be in a state where any appended data would become gibberish. You mention below another case - migration - where a similar problem exists.

Maybe the compression plans should be re-examined. One approach would be to have /var/logs/firewall filesystem have compression=gzip (not lzma) and have enhance the hermes actor to compress data between the read() from the file system and the write() to the network. This would not solve the migration case.

  • Having the date fields higher in the dir structure makes it slightly easier to trim out old years/months.

True enough. We need someone to chime in on the impacts on Manta.

.../:vm_uuid/...

Is there potential contention with VM migration, i.e. if one gets firewall logs for the same VM UUID, but different CNs during the same hour? Yuck. It would be bit of a shame to have to add some other indicator on the path just for this rare case.

Indeed. I don't have a good answer for this and eliminating the need to use mfind or similar.

...json.gz

Why the ".json" instead of ".log"? It isn't strictly valid JSON. I understand what you mean though, so I don't have strong opinion.

.log.gz it is.

mgerdts commented 5 years ago

Replying to @jclulow

I took a look at the Log Archiver Service section.

The UUID translation facility as described seems idiomatic with the date translation stuff that's already there, which is great.

Overall, I don't think you need as many phases in the project. The scaling problem is relatively simple to solve: the only bottleneck of which I'm aware is the proxy service which is totally stateless. You can just spin up more processes -- we've even done this before as a kind of hotpatch in the past.

ok, we'll take a closer look at this.

Even if you decide to go as far as totally replacing the entire proxy component with something else like Squid that might allow better vertical scale, though that would be a lot more work if done properly, you won't need to add mechanism to the master to sign URLs. The proxy is a straight TCP forwarder; the actors themselves are full Manta clients which are already handed appropriate credentials by the master.

Good to know, updated accordingly.

I'd be inclined to just do the scaling (and customer UUID mapping) work for Hermes straight up, and add the firewall logging service to the existing logset configuration using the existing Hermes instance. It seems like the shortest path to both solving the existing Hermes scale problems and meeting the new firewall logging needs.

I was under the impression that there was a desire to split hermes off from the sdc zone but can't articulate the motivation. @trentm or @kusor - is there anything to this?

trentm commented 5 years ago

I was under the impression that there was a desire to split hermes off from the sdc zone

One argument for separating the handling of cfwlog archiving and sdc log archiving was for separation of concerns. They have two different audiences (one for customres, on for operators) and migth have separate reasonable SLOs. Having them both handled by one service doesn't otherwise sound too bad to me. The "sdc" zone is a grabbag dumping ground zone that for some reasons needs to be a singleton. If we think we want to horizontally scale this log archiving service, then I think it should move out to a separate zone to not have the conflict between "log archiving wants multiple zones", "sdc0 zone cron jobs and napi-ufds-watcher don't know how to coordinate between multiple zones".

askfongjojo commented 5 years ago

@mgerdts, thank you for the recent update of the RFD.

I have two additional questions:

  1. If I understand correctly, when a CN boots up on a new platform image, the process to rewrite the existing ipf rules to the new format happens en masse for all the VMs in it. Is there any measure in place to protect the instances against unauthorized attack before the rewrite is completed?
  2. Is there a plan for the log archiver to spread out the firewall log upload to avoid overwhelming Manta? (As it is today, the hourly Triton and Manta log upload in a large-scale manta deployment already put a lot of pressure on Manta metadata tier at the top of the hour.)
mgerdts commented 5 years ago

@mgerdts, thank you for the recent update of the RFD.

I have two additional questions:

  1. If I understand correctly, when a CN boots up on a new platform image, the process to rewrite the existing ipf rules to the new format happens en masse for all the VMs in it. Is there any measure in place to protect the instances against unauthorized attack before the rewrite is completed?

Yes. The dependencies are set on the services such that the service that rewrite of rules completes before the vmadmd or zones service may try to boot any VMs.

  1. Is there a plan for the log archiver to spread out the firewall log upload to avoid overwhelming Manta? (As it is today, the hourly Triton and Manta log upload in a large-scale manta deployment already put a lot of pressure on Manta metadata tier at the top of the hour.)

That is part of the longer-term plan for hermes, yes. In discussions with the Manta team, it seemed their biggest concern was avoidance of hot shards, which are particularly perturbed by mmkdir -p when most of the directory hierarchy already exists.

siepkes commented 4 years ago

I didn't see a mention of it in the RFD but you folks might already be aware of it, so just FYI;

There is a standard for logging IP flows called IP Flow Information Export. This standard is basically the IETF version of Cisco's Netflow. For example OpenBSD has an implementation which it exposes via pflow. The advantage being there is a whole slew of tools which support IPFIX for visualizing, reporting, etc.