These are the results of a discussion with @rakshasa.

Problems

Security groups are handled directly in filter_manager in core.

Security groups are not core functionality. They are a combination of three features.

Rules: Firewall rules that accept traffic coming from a certain ip range on a certain port.
Reference: The same as the above bit instead of an ip range, accept it for a set of ip addresses. More specifically, the ip addresses of every interface in a security group of your choosing.
Isolation: Accept all traffic from a set of ip addresses. More specifically the ip addresses of every interface in the same security group.

Because of this, security groups needs to be moved out of the core directory and into services.

The filter_manager needs to be completely replaced. Rather than its current security group functionality, it should just focus on defining the Rules feature. Just simple rules that open certain ports. Different OpenVNet services can then use these.

Currently the security groups related flows are all in TABLE_INTERFACE_INGRESS_FILTER. That table should only be used for the Rules feature which will be handled by the new filter manager.

Isolation and Reference features are currently a big O notation

Currently these features work like this. Let's look at isolation first. Imagine this situation:

Interface if-xxx in security group sg-xxx
Interface if-yyy in security group sg-yyy

OpenVNet will create the following flows.

if source_ip == <if-xxx's ip> && destination_interface == if-xxx then accept
if source_ip == <if-yyy's ip> && destination_interface == if-xxx then accept
if source_ip == <if-xxx's ip> && destination_interface == if-yyy then accept
if source_ip == <if-yyy's ip> && destination_interface == if-yyy then accept

4 flows for 2 interfaces. No imagine that we add interface if-zzz to security group sg-zzz. After that one's added OpenVNet will have these flows for isolation.

if source_ip == <if-xxx's ip> && destination_interface == if-xxx then accept
if source_ip == <if-yyy's ip> && destination_interface == if-xxx then accept
if source_ip == <if-zzz's ip> && destination_interface == if-xxx then accept
if source_ip == <if-xxx's ip> && destination_interface == if-yyy then accept
if source_ip == <if-yyy's ip> && destination_interface == if-yyy then accept
if source_ip == <if-zzz's ip> && destination_interface == if-yyy then accept
if source_ip == <if-xxx's ip> && destination_interface == if-zzz then accept
if source_ip == <if-yyy's ip> && destination_interface == if-zzz then accept
if source_ip == <if-zzz's ip> && destination_interface == if-zzz then accept

Now we have 9 flows for 3 interfaces. The amount of flows needed is increasing exponentially.

Solution

This solution is no longer valid. Refer to the first comment for the revision

First of all, this will be handled in a new table. It needs to be taken out of TABLE_INTERFACE_INGRESS_FILTER as mentioned above. The name of this new table isn't decided yet but let's call it TABLE_ISOLATION for now.

~~Also as mentioned above, the filter_manager should not handle this feature. A new isolation_manager (name might change) will be created.~~

@rakshasa believes the most common case will be to have only one security group per interface. Writing a solution that caters to that most common case will already decrease the amount of needed flows drastically.

~~The idea is to create security group sets that interfaces are placed in. These sets will not be in the database. It is a data construct that only the isolation_manager uses.~~

The first time a packet goes out, it will be sent into the controller. The controller then checks which security groups the sending interface is in and creates a security groups set for that. For example:

{ 1 => { groups: [ sg-xxx, sg-yyy ] interfaces: [ if-xxx, if-yyy, if-zzz ] } }

~~We are using a set with numeric keys instead of an array because this is not a guaranteed sequence. 1 in this case is a security group set ID generated by the code.~~

~~This ID will again be used in the flow tables. We will write this ID to the metadata of a packet and afterwards make a decision based on it. Something like the following.~~

SOME_FLOW_TABLE_BEFORE_ISOLATION
if metadata==interface:if-xxx then write_metadata(secg_set:1), goto: ISOLATION_TABLE
if metadata==interface:if-yyy then write_metadata(secg_set:1), goto: ISOLATION_TABLE
if metadata==interface:if-zzzz then write_metadata(secg_set:1), goto: ISOLATION_TABLE

TABLE_ISOLATION
if metadata==secg_set:1 then accept

New solution

After more discussion, we have worked out problems with the last solution and decided to do it this way instead.

The Interfaces <-> Security Groups relationship might be many to many, but @rakshasa expects the most common case to be multiple interfaces in only one security group. We will optimize this case. When an interface is in multiple security groups, we will handle it in another table where we still do regular point to point connections.

Consider this scenario:

Interface if-xxx with IP x.x.x.x in security group sg-A. (local interface)
Interface if-yyy with IP y.y.y.y in security group sg-A. (remote interface)
interface if-zzz with IP z.z.z.z in security groups sg-B and sg-C. (local interface)
interface if-aaa with IP a.a.a.a in security group sg-B. (remote interface)

First we get this situation. Remark: Notice how we only have flows for if-xxx and if-zzz. That's because only those are local interfaces. The others are remote. They're on another datapath which will have flows for them.

TABLE_ISOLATION_INGRESS_CLASSIFIER
20 if metadata==interface:if-zzz then goto: TABLE_ISOLATION_INGRESS_MULTIPLE
30 if metadata==interface:if-xxx then write_metadata(secg:sg-A), goto: TABLE_ISOLATION_INGRESS_SINGLE

TABLE_ISOLATION_INGRESS_SINGLE
10 goto: controller

TABLE_ISOLATION_INGRESS_MULTIPLE
10 goto: controller

We have a classifier table that decides which interfaces have multiple security groups on them and which have single ones. In case of a single security group, we rewrite the metadata to contain the security group ID instead of the interface ID.

The single security groups classifier flow has a higher priority than the multiple security groups one. That's to make sure no packets are dropped when we add a new security group to the interface. When that happens we first add the multiple security groups related flows while the single security groups flow bypasses them because of its higher priority. When all that's said and done, we remove the single security group flow.

Now imagine that if-yyy sends a packet to if-xxx This will be accepted because they're in the same security group. The packet will go to TABLE_ISOLATION_INGRESS_SINGLE and from there into the controller. The controller will then see that this packet came from if-xxx's IP address and add a flow to accept it.

TABLE_ISOLATION_INGRESS_CLASSIFIER
20 if metadata==interface:if-zzz then goto: TABLE_ISOLATION_INGRESS_MULTIPLE
30 if metadata==interface:if-xxx then write_metadata(secg:sg-A), goto: TABLE_ISOLATION_INGRESS_SINGLE

TABLE_ISOLATION_INGRESS_SINGLE
10 goto: controller
30 if metadata==secg-A && src_ip==x.x.x.x then accept

TABLE_ISOLATION_INGRESS_MULTIPLE
10 goto: controller

Now if-xxx can freely exchange packets with if-yyy. Next, imagine that if-zzz sends a packet to if-yyy. These interfaces do not share a security group so it will be blocked.

The packet will again go to TABLE_ISOLATION_INGRESS_SINGLE and into the controller. The controller will see that this is not a trusted IP address. It will add a drop flow to prevent packets from that IP flooding the controller.

TABLE_ISOLATION_INGRESS_CLASSIFIER
20 if metadata==interface:if-zzz then goto: TABLE_ISOLATION_INGRESS_MULTIPLE
30 if metadata==interface:if-xxx then write_metadata(secg:sg-A), goto: TABLE_ISOLATION_INGRESS_SINGLE

TABLE_ISOLATION_INGRESS_SINGLE
10 goto: controller
20 if metadata==secg:sg-A && src_ip==z.z.z.z then drop
30 if metadata==secg:sg-A && src_ip==x.x.x.x then accept

TABLE_ISOLATION_INGRESS_MULTIPLE
10 goto: controller

We've now seen how interfaces in a single security group works. Now let's look at how mutliple security groups work. Unfortunately there is not much optimization here yet. In this case it's still a check for destination interface ID and source IP.

When if-aaa sends a packet to if-zzz, it will be accepted because they share security group sg-B. The packet will go to TABLE_ISOLATION_INGRESS_MULTIPLE and into the controller. The controller will see that it's accepted and add a flow for if-zzz to accept if-aaa's IP address.

TABLE_ISOLATION_INGRESS_CLASSIFIER
20 if metadata==interface:if-zzz then goto: TABLE_ISOLATION_INGRESS_MULTIPLE
30 if metadata==interface:if-xxx then write_metadata(secg:sg-A), goto: TABLE_ISOLATION_INGRESS_SINGLE

TABLE_ISOLATION_INGRESS_SINGLE
10 goto: controller
20 if metadata==secg:sg-A && src_ip==z.z.z.z then drop
30 if metadata==secg:sg-A && src_ip==x.x.x.x then accept

TABLE_ISOLATION_INGRESS_MULTIPLE
10 goto: controller
30 if metadata==interface:if-zzz && src_ip==a.a.a.a then accept

When if-yyy sends a packet to if-zzz, it will be dropped because they have no shared security groups. The packet will again go to TABLE_ISOLATION_INGRESS_MULTIPLE and into the controller. The controller will see that this is not a trusted IP and add a drop flow.

TABLE_ISOLATION_INGRESS_CLASSIFIER
20 if metadata==interface:if-zzz then goto: TABLE_ISOLATION_INGRESS_MULTIPLE
30 if metadata==interface:if-xxx then write_metadata(secg:sg-A), goto: TABLE_ISOLATION_INGRESS_SINGLE

TABLE_ISOLATION_INGRESS_SINGLE
10 goto: controller
20 if metadata==secg:sg-A && src_ip==z.z.z.z then drop
30 if metadata==secg:sg-A && src_ip==x.x.x.x then accept

TABLE_ISOLATION_INGRESS_MULTIPLE
10 goto: controller
20 if metadata==interface:if-zzz && src_ip==y.y.y.y then drop
30 if metadata==interface:if-zzz && src_ip==a.a.a.a then accept

Could you list the tasks to be done and their order? Specifically regarding separation of filter rules, dynamic loading of flows and single sg optimization.

1) Decide deprecation strategy for the current filter_manager. Config option to disable it for example.

2) Implement the new filter_manager (single arbitrary rules that open a tcp/udp port or icmp type for a source ip). Do this in TABLE_INGRESS_FILTER

3) Create the concept of isolation groups (!= security groups) in the database. These are groups of interfaces that allow traffic with each other. Create an isolation_manager to handle them.

3.1) Implement TABLE_ISOLATION_INGRESS which basically has the behaviour of TABLE_ISOLATION_INGRESS_MULTIPLE for everything.

3.2) Split TABLE_ISOLATION_INGRESS into TABLE_ISOLATION_INGRESS_MULTIPLE and TABLE_ISOLATION_INGRESS_SINGLE. Do the single isolation group optimization at this point.

4) Create the security group service that implements the new filter manager and isolation manager.

5) Add the reference feature to security groups. At first this can be done through the filter manager and then we can decide if it needs to be optimized.

Remark: Step 4 could be done before step 3.2 if preferred.

Implement a new filter manager.
- Call it Filter2Manager until the old one is deprecated.
- Use translation manager as a template.
- Use TABLE_INTERFACE_INGRESS/EGRESS_FILTER.
- Simple filtering rules mode (called static_filter/address mode?)
- Each rule has a boolean for either drop of pass.
- Each rule has two booleans for ingress and/or egress filtering.
- Each rule takes a prefix for address and port, 0/0 matches all.
Migrate non-isolation SG filter rules to the new filter manager.
- Replace all non-isolation filtering rules with calls to Filter2Manager.
- As security group 'filter manager' is being turned into a service, we shouldn't need to keep the old code/api as an option.
Add a filter2 manager mode called interface_isolation.
- This gives a relationship of interface->filter->isolation.
- The interface_isolation mode only needs to hold the SG id.
- Make FilterManager enable isolation by calling Filter2Manager, even though isolation manager is not implemented.
Implement an isolation manager.
- Decide if we want to isolate based on all addresses on an interface or specific ip address+interface pair.
- If by address it makes it cleaner at the lower level, though requires the SG service to do work when we add/remove addresses.
- These addresses would not be ip_address id's, instead just the ip address itself.
- Unless it would be too much work, create an isolation legacy mode that pretty much copies code from the old filter manager.
- If this can be done, it will allow us to have a working system in place and get rid of the old filter manager, etc.
- Clean up at this point if possible.
- ??? This probably needs to wait until SG service is implemented.
- COOKIE_PREFIX_CONNECTION packet_in should be changed to isolation.
Create new isolation mode.
- Dynamically load all rules by the way of packet_in, as is currently done.
- Implement optimization for the case where an interface is in a single isolation group.
Implement a security group service manager.
- Responsible for translating rule sets into filter and isolation rules.

I mostly agree with the above but have a few comments.

Why is there a database relation between filter and isolation?

Filter is a single rule that accepts traffic from a given ip/prefix to a given tcp/udp port. Isolation is a set ip IP source addresses to accept traffic from. I don't see why there should be a DB relation. I think this is the correct way.

Interface 1 <=> n Filter Interface n <=> n Isolation

Filter needs some additions
- a protocol field that can hold the values tcp, udp and icmp.
- It needs to be possible to accept traffic on all tcp/udp ports in a single rule. We can use port number 0 for that.
- What about port ranges? Is it possible to create 1 single flow to accept traffic from a tcp/udp port range? It would be a good feature to have that deserves to be investigated.
Isolation should care only about IP sets

You raise the question of wether Isolation manager should only worry about source IP sets or also source interface sets. My opinion is that it should do only source IP sets. As you said, this is cleaner on the lower level and in addition to that, it allows for inclusion of non OpenVNet managed IP addresses.

Keeping track of the IP addresses assigned to interfaces is a different job that should be handled on a higher level. More specifically in the SG service, as you also mention.

Isolation should be renamed

The term isolation originated from a security groups feature where they block traffic between interfaces that aren't in the same security group. Security groups are 'isolated' from each other so to say.

Now that we're building this as a lower level feature than can be used separately from security groups, I think a name change is in order. Something like "Trusted IP" or "Whitelist IP".

Why should COOKIE_PREFIX_CONNECTION's packet_in be changed to Isolation?

This packet_in is for connection tracking purposes. What does it have to do with Isolation?

Why is there a database relation between filter and isolation?

I'll explain this later.

Filter needs some additions

a protocol field that can hold the values tcp, udp and icmp.

It needs to be possible to accept traffic on all tcp/udp ports in a single rule. We can use port number 0 for that.

What about port ranges? Is it possible to create 1 single flow to accept traffic from a tcp/udp port range? It would be a good feature to have that deserves to be investigated.

All but the port ranges are straight forward additions.

OF uses bit masks, so a port range may require one or more flows to implement. E.g. port 10 up to and including 32 would have;

0x000A/0xFFFB <10,11>
0x000C/0xFFFC <12,15>
0x0010/0xFFF0 <16,31>
0x0020/0xFFFF <32>

These rules would have the same cookie ID, however one would need to make sure they don't overlap with other rules that end up with the same OF match as that would replace the other rule. This means that we need to add different ranges to cascading priorities in order to ensure this doesn't happen.

E.g. if you have rules for port ranges <0,0xFFFF>, <0x0300,0x03FF>, <0x0700,0x07FF>, then the 1st rule is lowest priority 30 (to be decided) with 0x0 mask, 2nd and 3rd are using the 0xF800 mask which has 5 bits wildcard and thus should be priority 35.

Since flow priority can be a large number we can allow for various kinds of overlap and prioritization rules, however atm we'll keep it simple. Note that the priority of all the flows in the quoted example of ports 10-32 would be in the same flow priority.

Isolation should be renamed

The term isolation originated from a security groups feature where they block traffic between interfaces that aren't in the same security group. Security groups are 'isolated' from each other so to say.

Now that we're building this as a lower level feature than can be used separately from security groups, I think a name change is in order. Something like "Trusted IP" or "Whitelist IP".

We can decide on the color of the bikeshed later when we figure out what it gets used for.

Why should COOKIE_PREFIX_CONNECTION's packet_in be changed to Isolation?

This packet_in is for connection tracking purposes. What does it have to do with Isolation?

Prefix is about the manager's cookie ID, and isolation manager would be handling isolation related connections. If we later add a connection manager of some sort then isolation manager would be calling that to set up the flows.

Why is there a database relation between filter and isolation?

Issues with having two or more separate features handling the pre/post-interface actions

Currently a single interface setting is used and only filter manager needs to be touched by the interface manager and its tables. If we have two or more different features enabled the same interface it will require all pre/post-interface processing to happen in the same table without conflicts. (or across multiple tables with the right goto flows accounting for all active features)

Assuming we do not use a filter isolation mode, this means that interface needs to add extra setting to enable the goto flow for isolation (which needs to be a different priority from the filter goto flow). Or just one setting that enables pre/post-processing, however that introduces issues with requiring users / front-ends / vnet-services to check both filter and isolation manager (+ any others we might implement) for active items when deciding if they need to enable the goto flow. This complexity increases as we add more features.

Additionally it leaves us with no control of the order of flows for various features, or the priority logic gets spread out amongst otherwise unrelated features.

Filter manager and tables as generic pre/post-interface, routing and translation entry point

Translations probably should work through a filtering mode... this allows us to decide with the filtering mode if we translate everything before filtering, or after filter, when to drop, etc...
Need to pass base priority (and table id's?) for pre/post to translation/isolation manager so that the filter (or pre/post-processing manager?) can order the translation, filtering and isolation as required to get the wanted behavior.
- E.g. translation post-filtering, drop if no translation found. Filter has itself settings for passing all no-match, or dropping.
Most correct implementation would probably have a separate pre/post-interface processing manager, however filter manager seems a good enough a place to handle this for now.

axsh / openvnet

Security group refactoring #286

Problems

Security groups are handled directly in filter_manager in core.

Isolation and Reference features are currently a big O notation

This solution is no longer valid. Refer to the first comment for the revision

New solution

Why is there a database relation between filter and isolation?

Issues with having two or more separate features handling the pre/post-interface actions

Filter manager and tables as generic pre/post-interface, routing and translation entry point