BCDevOps / developer-experience

This repository is used to track all work for the BCGov Platform Services Team (This includes work for: 1. Platform Experience, 2. Developer Experience 3. Platform Operations/OCP 3)
Apache License 2.0
8 stars 17 forks source link

Migrate from Aporeto NSP to RedHat OpenShift Network Policy (RHNP) #902

Closed jleach closed 3 years ago

jleach commented 3 years ago

Important Dates

Schedule

Cluster Work (Jeff / Justin / Steven)

Tennant Work (Bev / Olena / Jason)

  1. Create comms, schedule a kick-off meeting; docs in place;
  2. For each environment tools / dev, test and prod teams will: a. Add an Aporeto NSP to override all other NSP and open an environment wide; b. Add either QuickStart RHNP or custom (more secure) RHNP to an environment; c. Remove all Apo NSP to test RHNP d. We need to monitor and adjust.

Action Items (not comprehensive)

NOTES

jleach commented 3 years ago

@jefkel @j-pye @StevenBarre @mitovskaol @lukegonis Notes from todays meeting above.

jleach commented 3 years ago

AFAIK When Aporeto is disabled (switch into discovery mode) it has rules in place but chooses not to enforce them. This is unobtrusive and can be done without any impact to teams. However, when Aporeto is un-installed the firewall tables are updated / flushed. This is an impactful change as in-flight connections (matching Apo rules) will be terminated and need to be re-established. We saw this behaviour when we disabled Apo previously on OCP 3.11. @jefkel @j-pye @StevenBarre @mitovskaol

j-pye commented 3 years ago

@jleach That's correct. As apart of the Aporeto Install, Upgrade, and Uninstall process there is a rolling drain/evac of all nodes in the cluster. (For exactly the issue you've noted from the 3.11 days) So the impact then becomes that of a standard maintenance involving updating each node. As long as teams have resilient / highly available application deployments this should not be a problem in OCP4.

mitovskaol commented 3 years ago

The TIB for this change has been sent out on Feb 5

Platform Managed Container Service Product Aporeto Software Defined Network Service Implementation Date(s)

March 29tth, 2021 Activity The Software Defined Network (SDN) Service on Openshift 4 Platform will be transitioning from Aporeto which is the current SDN solution to the Openshift 4 Built-In SDN in the OCP 4 Silver cluster. The transition will allow the product teams working on the Platform to benefit from using the enterprise-grade solution and the enterprise support that the BC Gov has in place with RedHat. Description All namespaces in the OCP 4 Silver Cluster will need to replace their Aporeto NetworkSecurityPolicy objects (NSPs) with RedHat NetworkPolicy objects. The Aporeto SDN and the OCP 4 Built-In SDN solution will be run in parallel until March 25th to allow sufficient time for the policy replacement in all projects hosted in the OCP 4 Silver cluster. Customer Impact The SDN service migration will occur in stages with respect to the project environments with the first migration being that of the DEV instances followed by TEST/TOOLS and PROD in one week increments. Feb 15th, 2021: OCP 4 SDN Samples, templates and migration support will be made available to product teams to start the policy migration Feb 22nd 2021: Support for Aporeto NSPs is disabled in all DEV namespaces in the Silver cluster Mar 1st 2021: Support for Aporeto NSPs is disabled in all TEST namespaces in the Silver cluster Mar 8th 2021: Support for Aporeto NSPs is disabled in all TOOLS namespaces in the Silver cluster Mar 25th 2021: Support for Aporeto NSPs is disabled in all PROD namespaces in the Silver cluster Aporeto SDN will be disabled in all namespaces of the Silver cluster on March 25, 2021 and after this time, Aporeto NSP will not be in effect anymore. Teams can start replacing their Aporeto NSPs with RedHat NetworkPolicy objects in all their namespaces any time. They will be provided instruction for how to disable the enforcement of Aporeto NSPs in their namespaces by themselves at any time in order to start testing the RedHat Network Policies as soon as possible. Aporeto NSPs must be replaced with RedHat Network Policies before the support for Aporeto NSP is disabled for a specific namespace type (DEV/TEST/TOOLs/PROD) on the Platform level as per the schedule above. The Platform Services Team will guide the product teams through the SDN service migration offering the following support: SDN Service Migration Kick-off meeting that will provide an overview of the changes and will include a demo of Aporeto NSPs replacement in a namespace. Instructions for the Aporeto Network Security Policy replacement with RedHat Network Policy including a video tutorial will be uploaded to DevHub at http://developer.gov.bc.ca Several live-help sessions will be held between Feb 15th and March 25th to assist the teams The Platform Services Team will provide on-demand one-on-one support to the teams that need it. Testing Procedures The SDN service migration will occur in stages with respect to the project environments starting from non-production environments which will allow the teams to test the network security enforced by the new SDN Solution before enabling it in their production environment. There will be a period of time when both SDN solutions are available which will allow the teams to migrate from Aporeto SDN to OCP 4 Built-In SDN at their convenience.


SDN 101: Background information

Software Defined Networking (SDN) for the Data Centres is a technology similar to network firewalls, using software and standard servers to allow Clients to specify network security policies. SDN technology differs from classical firewalls in that it is implemented to allow for scalability using commodity hardware, Infrastructure-as-Code (IaC), full self-service, improved tagging and rulesets, and the ability to implement microsegmentation and zero trust network security. As such. an SDN solution allows teams and administrators to have more flexibility and speed at which they can provision network connectivity for their projects (yes, it’s self-served!). It also provides granular definition and control of communications between application components that would be much harder to implement with traditional firewall technology.

The new Openshift 4 Platform is hosted in the new SDN compartment network zone that was built in the Kamloops Data Center in 2020.

The SDN compartment network zone is intended to host systems that require integrations with cloud services and, as such, has less physical security (e.g., firewalls) to protect the systems; instead, it requires each hosted system to implement its own SDN solution to provide network security.

jleach commented 3 years ago

As per MattR if we want egress rules we need to be using OVN-Kubernetes SDN. This means either re-install 4.6 on cluster(s) or wait for OCP 4.8 where a live upgrade path will exist.

jefkel commented 3 years ago

When it's available (for upgrade), you'll want to investigate a migration to OVN (as that is where most of the feature development will be for RedHat too.)

WRT egress rules - you are correct in that we are not able to make egress rules available to teams for self service. You can implement an egressNetworkPolicy object (as clusteradmins) for projects if egress controls are required. See: https://docs.openshift.com/container-platform/4.5/networking/openshift_sdn/configuring-egress-firewall.html

You would want to have a process for requesting/approving/applying before this becomes a "generally available" solution for teams, however if there are specific project requirements, you can work directly with the teams to implement an egressnetworkpolicy for them.

mitovskaol commented 3 years ago

@jleach Is it possible to determine the egress rules that a team would need by scanning their Aporeto's NSP? This way we can just implement the egress rules for them in advance and they don't need to submit the request for this.

jleach commented 3 years ago

@mitovskaol Its a bit complicated: Most teams will implement some for of egress rules because it's required now. With the move to KNP its going to be optional. Some teams, like OrgBook/AG/JAG will require total isolation and want bespoke egress rules. For these teams, I think we can just work with them to build the rules and implement. I just created #913 to track this particular issue.

jleach commented 3 years ago

871 May duplicate some work from this ticket.

j-pye commented 3 years ago

Updated Dates based on conversation in the Registry channel on RocketChat.