Landing Zones management groups design

SenthuranSivananthan commented 2 years ago

We are looking for feedback on this proposed change. ~~We will target this change for December release (v0.6.0).~~ Please share any alternative solutions via comments.

Landing Zones management groups are defined in this reference implementation as DevTest, QA and Prod to create alignment to various environments a customer might have. These are provided as examples and can be modified by each customer based on their preferred structure.

From Azure Portal:

We are considering to change the management groups from environment focused to data sensitivity focused.

The proposed target definition will enable us to customize Azure Policies (used for guardrails) based on data classification of subscriptions such as Unclassified, Protected A, Protected B, etc..

All subscriptions will be kept in these management groups and we would not add another level to break it out by environments.

Therefore, the change would be:

From current definition

pubsec
 |- Landing Zones
    |- DevTest
    |- QA
    |- Prod
 |- other management groups removed for simplicity

To proposed target definition

pubsec
 |- Landing Zones
    |- Unclassified
    |- Protected A
    |- Protected B
 |- other management groups removed for simplicity

KingBain commented 2 years ago

IMO I prefer the current structure.

creating landing zones by sensitivity I think will force all landing zones and workloads to the most restrictive sensitivity.

That and it might compell people to setup a lot of structure in mgmt groups ...you'll end up seeing, something like .

pubsec
 |- Landing Zones
    |- Unclassified
        |- DevTest  
    |- Protected A
    |- Protected B
        |- Prod
 |- other management groups removed for simplicity

SenthuranSivananthan commented 2 years ago

IMO I prefer the current structure.

creating landing zones by sensitivity I think will first all landing zones and workloads to the most restrictive sensitivity

or you'll end up seeing, something like
pubsec
 |- Landing Zones
    |- Unclassified
        |- DevTest  
    |- Protected A
    |- Protected B
        |- Prod
 |- other management groups removed for simplicity   

Thanks for the feedback @KingBain. We don't want to have environments sprawl under each classification since policies/guardrails for all environments will be the same.

hudua commented 2 years ago

I agree with @KingBain; in my experience working mostly with cloud data science projects, most projects correlate Protected B data and join with unclassified data for feature engineering purposes. In addition, having the top level of the hierarchical start with Dev, Test, Prod makes sense in my opinion as this is a common pattern in public sector.

The curse of multiplication as discussed here: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/enterprise-scale/faq#what-about-our-management-group-hierarchy doesn't really apply here as the current structure simply breaks down the hierarchical starting with dev, test, prod, etc. but does not suggest branching workloads into dev, test, prod.

ptd-tbs commented 2 years ago

I recommend keeping the current approach focused on environments, and the use of tags to identify data sensitivity as one of the attributes for each workload. An application can be unclassified in dev and test, but become PBMM when its in the Production environment when "real" data is stored in the environment. We used the current approach for a new application that was launched and did not run into any issues.

CalvinRodo commented 2 years ago

I have no strong opinion either way, however my preferred approach is to treat each environment (dev, test, prod) as identical across the board so that if prod is PB we treat all other environments the same way and they should abide by all the same policies so that environment drift doesn't become a problem.

However we also are unique in that we don't silo off functionality in our organization so if you work on dev or test you also need to manage the prod environment.

SenthuranSivananthan commented 2 years ago

Thank you all for the feedback. Based on the suggestions & on the conversations I've had with CSAs, it's best to stay the course with the current management group structure until there's another suitable path solution. I'm going to remove it from being our v0.6.0 release (December) but leave the issue open.

jtracey93 commented 2 years ago

I agree with @KingBain; in my experience working mostly with cloud data science projects, most projects correlate Protected B data and join with unclassified data for feature engineering purposes. In addition, having the top level of the hierarchical start with Dev, Test, Prod makes sense in my opinion as this is a common pattern in public sector.

The curse of multiplication as discussed here: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/enterprise-scale/faq#what-about-our-management-group-hierarchy doesn't really apply here as the current structure simply breaks down the hierarchical starting with dev, test, prod, etc. but does not suggest branching workloads into dev, test, prod.

@hudua Have you seen this part of the FAQ article where we talk about how to split subscriptions for workloads? https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/ready/enterprise-scale/faq#how-do-we-handle-devtestproduction-workload-landing-zones-in-enterprise-scale-architecture

nephinj commented 2 years ago

I personally feel that Dev, Test, Prod should all match. Although Dev and Test do not usually have sensitive data, you do not want to wait until production to realize that something you scripted is now blocked by policies. Good development practice is all your environments should be a mirror of each other. Most departments will not want to assume all of Production is PBMM so having a classification structure makes more sense to me. Without it you end up having to apply the PBMM policies at the subscription levels.

jtracey93 commented 2 years ago

Completely agree @nephinj 👍

From a policy perspective what in PBMM would make it restrictive to not wanting to apply to all of the environments and therefore under a single "archetype" management group for simplicity and scale? (apologies but PBMM is not something I have a lot of exposure to)

Maybe @SenthuranSivananthan can assist here also with my understanding to assist?

nephinj commented 2 years ago

@jtracey93 in a non-PBMM environment you may not want to restrict locations to Canada only, enforce data encryption at rest, enabling various Defender solutions, enable Azure P2, enable Sentinel, having alternate storage sites, boundary protection, advanced data security, etc.... Some items maybe too restrictive for the use cases or they may just be cost prohibitive to turn on.

jtracey93 commented 2 years ago

Awesome thanks for the reply.

Would it be fair that most of these landing zone subscriptions could be placed under the sandbox?

nephinj commented 2 years ago

Microsoft's guidance on Management Group Hierarchy talks about why you should not have dev/test/prod.

@jtracey93 looking back at the PBMM policy implementation it is mainly auditing so maybe it would not be as big of deal to just apply it to all environments.

jtracey93 commented 2 years ago

Completely agree and I actually wrote that dev/test/prod guidance for Azure Landing Zones 👍😎

Good to hear you think actually there isn't a difference in policy assignments required between environments now upon review. This is what we see commonly across various customers of all industries and types 👍

KingBain commented 2 years ago

Just to twist the conversation a bit, I suspect we're looking at this base layer of mgmt groups from the perspective of developers and the development process lifecycle.

A developer tests in dev, finishes and pushes to QA. QA/Test test approves the request and notifies change mgmt and the code can go into Production, ops pushes into production.

In the original CAF, this lower level of mgmt groups was grouped by workload. There was Online and Corp with makes for a network zoning kind-of archetype. Which is one way of structuring work and honestly I don't know why you moved away from that.

That said, the other way of organizing hierarchies is by Business Function. Like in Enterprise architecture that's how you would do the grouping. Technology doesn't exist by itself, it serves the business, so things get structured around the business function, business user and business processes.

So at that lower layer under landing zones, you'd mgmt. groups by business function.

Production Services Testing Services Development services

We're basically aggregate similar workloads by their behaviour, and not by their lifecycle.

... It all gets muddy when the business is IT :)

ghost commented 2 years ago

I want to add our experience after running Azure for a couple years.

Please note, these are not the real names of the MGs but their function/division

We had started (back before CAF) with almost top level dev/prod split.

Prod where the full PBMM rules / policies / quotas , etc are in place.
Dev was where very little was enforced but cherry picked things. (Also the shared sandbox was also in here)

As we are moving away from shared subs (CAF) we have gone full hub and spoke. Cloud Ops only maintains the Core of Azure and central services while the Client manages their subscription spokes( typically dev, test/uat/qa, and prod) fully. Here is the MG structure we are moving to and some details about where the workloads go.

FinOps, Hub, Ops and Security are subscriptions under Core and not management groups

AllSubs is just a substitute for the root tenant. This is to allow giving, for example, a global RBAC permission (eg. Reader) but without giving access to the actual tenant root.

Major projects:

In DEV; DEV/MAJOR-PROJECT/MAJOR-DIVISION/solution subscription
In PROD; PROD/MAJOR-PROJECT/MAJOR-DIVISION/solution subscription

Minor projects:

In DEV; DEV/project subscription
In PROD; PROD/project subscription

Core projects:

Core Infra projects; (DEV in DEV/core-dev subscription, PROD in CORE/HUB/location subscription or CORE/Ops subscription)
FinOps; DEV in DEV/finops-dev dub, PROD in CORE/FinOps subscription
Security; DEV in DEV/sec-dev sub, PROD in CORE/Sec subscription

Overview

AllSubs

Core

HUB

Dev

Major Project

Major Division

Major Division

Prod

Major Project

Major Division

Major Division

Quarantine

The Quarantine MG is for the default place the subscriptions get made as it is easy for our users to make the mistake of using their work credentials for their VS Subscriber Azure credits benefits.

To handle this, the Quarantine MG would have almost all the ability to deploy resources blocked by policy with messages and information on how to correct their state to get their own tenant.

I am not saying this will work for everyone but this is based off of experience of 2-3 other org's experience. Hopefully I have explained the 'why' enough to help with this structure discussion!

Edit 1 -- ASCII art was nice but didn't post well :)

nephinj commented 2 years ago

A slight tweak on the first approach would be to base it on the cloud profiles 1 (experimental), 2 (non-sensitive), 3+(sensitive up to PB). That way you could align policies to the guardrails. https://www.gcpedia.gc.ca/gcwiki/images/8/84/GC_Cloud_Guardrails.pdf

KingBain commented 2 years ago

Is there an opportunity here to try and change how the management groups are configured in CALZ?

My group renamed all of the mgmt groups(but we kept the basic structure) in CALZ and it ultimately broke the subscription pipelines. We were able to fix it with a work around, but it begs the question. When configuring management groups in CALZ in, why are we setting displayname and name to the same value ?

I dont know how you could impliment it, but I feel it wouldmake it easier for targetting subscription via code and allow humans to have readable mgmt group names.

Usind the screenshot above as the example

I understand why youre doing it...or i think i do; avoiding name collision/reusing mgmt group names

and I think targetting a guid would also make it easier deep in the code where your doing the name concatenation and string splitting

SenthuranSivananthan commented 2 years ago

Is there an opportunity here to try and change how the management groups are configured in CALZ?

My group renamed all of the mgmt groups(but we kept the basic structure) in CALZ and it ultimately broke the subscription pipelines. We were able to fix it with a work around, but it begs the question. When configuring management groups in CALZ in, why are we setting displayname and name to the same value ?

I dont know how you could impliment it, but I feel it wouldmake it easier for targetting subscription via code and allow humans to have readable mgmt group names.

Usind the screenshot above as the example

I understand why youre doing it...or i think i do; avoiding name collision/reusing mgmt group names

and I think targetting a guid would also make it easier deep in the code where your doing the name concatenation and string splitting

Thanks for raising this @KingBain. I've created 2 separate GitHub issues to improve this experience:

Please see #167 and #168

SenthuranSivananthan commented 2 years ago

Thanks to everyone for the feedback. We've learned that majority of the structure stays consistent with differences within the child management groups under Landing Zones. This is based on how each organization operates, organizes their subscriptions and prefers to apply Azure Policies.

Therefore, instead of trying to fit all organization into a single structure, we will be updating our automation so that management group structure can be defined through configuration. This will allow for:

Flexible management group hierarchy
Flexible management group ids & display names

As a result, you will be able to design your management group hierarchy any way you prefer as long as you limit the depth to 6. See constraints in Azure Docs. This means, the structure can:

Follow the recommendation from Cloud Adoption Framework & Azure Landing Zone Reference Architecture
Follow the design based on data classifications
Follow the design based on environment lifecycles (dev, qa, prod, etc.)
Something else that you prefer

Work will be completed as part of #153.

Azure / CanadaPubSecALZ

Landing Zones management groups design #77