Azure / enterprise-azure-policy-as-code

Enterprise-ready Azure Policy-as-Code (PaC) solution (includes Az DevOps pipeline)
https://azure.github.io/enterprise-azure-policy-as-code/
MIT License
415 stars 219 forks source link

ADO pipeline is constantly resulting with write-error "Assignment JSON file './Definitions/global-settings.jsonc' is not valid." #717

Closed LarsVidingSE closed 1 month ago

LarsVidingSE commented 1 month ago

We have following the instructions according to https://azure.github.io/enterprise-azure-policy-as-code/start-implementing/

The design in short is as following. 1, The goal is a brownfield installation that’s because of the present Azure environment do have a lot of policies in use and we do not want to mess with them to start with. We will use a single tenant and a single pacOwnerId. 2, Forked the https://github.com/Azure/enterprise-azure-policy-as-code to a ADO Repo (EPAC-fork) 3, Synced SourceDirectory EPAC-Fork to DestinationDirectory EPAC-Prod 4, Created the Global-settings.jsonc file with two pacEnvironmentSelectors (epac-dev and prod) and chosen "strategy": "ownedOnly" "keepDfcSecurityAssignments": true 5, Created 4 Registered Application in Entra ID with federated credentials. (spn-epac-dev, spn-epac-plan, spn-epac-tenant-deploy and spn-tenant-roles) 6, Created one ADO service connections with workload identity federation (manually) per each SPN. 7, Created pipeline files with the help of New-PipelinesFromStarterKit -StarterKitFolder .\StarterKit -PipelinesFolder .\pipelines -PipelineType AzureDevOps -BranchingFlow GitHub -ScriptType module 8, Adjust the created pipeline yml-files according to ADO Service connections and pacEnvironmentSelectors.

When I try to run the first pipeline epac-dev-pipeline.yml It starts up • Initialize job • Checkout EPAC-Prod (which is our repo) • PowerShell • Plan: Her the pipeline stops with Write-Error "Assignment JSON file './Definitions/global-settings.jsonc' is not valid"

| Start error output from pipeline. Read global settings from './Definitions/global-settings.jsonc'.

PowerShell Versions: 7.4.3 Write-Error: /home/vsts/.local/share/powershell/Modules/EnterprisePolicyAsCode/10.5.1/internal/functions/Select-PacEnvironment.ps1:12 Line | 12 | … lSettings = Get-GlobalSettings -DefinitionsRootFolder $DefinitionsRoo … | ~~~~~~~~~~~~~ | Assignment JSON file './Definitions/global-settings.jsonc' is not valid.

[error]PowerShell exited with code '1'.

| End error output from pipeline.

In my journey of troubleshot and figure out what is wrong, I have tried to follow the code and tried to do steps manually for hopefully be able to see a more detailed error then "is not valid". I have checked and double checked the syntax of the global-settings.jsonc. I have carefully reading the json schema of the global-settings. Well Í haven’t been able to find anything that indicating that our global-settings.jsonc file is not valid!? Then I started to manually go from step to step.

The first epac-dev-pipeline-yml are referring to templates/plan.yml for its stage Plan. The plan.yml is the one that is doing everything in the Plan stage. And in the bottom of this file, it is making an inline with. Build-DeploymentPlans -PacEnvironmentSelector ${{ parameters.pacEnvironmentSelector }} -DevOpsType "ado" -InformationAction Continue

The Build-DeploymentPlans is calling to the function Select-PacEnvironmet which in turn is calling the Get-GlobalSettings. The function Get-GlobalSettings comes to the bellow section, it is checking and validation the global-settings.jsonc file.

$Json = Get-Content -Path $globalSettingsFile -Raw -ErrorAction Stop $settings = @{} try { $settings = $Json | ConvertFrom-Json -AsHashTable } catch { Write-Error "Assignment JSON file '$($globalSettingsFile)' is not valid." -ErrorAction Stop }

If the command [$settings = $Json | ConvertFrom-Json -AsHashTable] fails, the output is

"Assignment JSON file './Definitions/global-settings.jsonc' is not valid."

So, when I am manually running Select-PacEnvironment, this function is calling Get-GlobalSettings and there is no error at all? And of course, we have the same results of manually running Build-DeploymentPlans, which in turn creates policy-plan.

I have also manually run the function Get-GlobalSettings, with no error. So my conclusion is that our global-settings.jsonc - Is Valid.

Why is our pipeline epac-dev-pipeline.yml failing with Write-error "Assignment JSON file './Definitions/global-settings.jsonc' is not valid." ?

Many thanks in advance. And I really appreciate any suggestions.

Best regards Lars Viding

LarsVidingSE commented 1 month ago

Hi @apybar, Do you need some more information? Just let me know, then will I get it for you. Or if you want me to test something. Looking forward to collaborating on this issue.

apybar commented 1 month ago

Hey @LarsVidingSE - Apologies, I meant to respond immediately after assigning this to myself but got distracted. Thanks for the reminder.

Steps 1-8 looks good, especially step 4 how you set "strategy": "ownedOnly", this is important for not changing, deleting, modifying current Azure Policy objects.

Unless I missed this somewhere in your summary above, can you please provide the link of the global-settings.jsonc you are using as a template. I'm noticing there are some outdated templates in our documentation and repo such as here that needs to be updated. I will fix this asap.

In the meantime, please confirm your global-settings.jsonc matches the following:

{
    "$schema": "https://raw.githubusercontent.com/Azure/enterprise-azure-policy-as-code/main/Schemas/global-settings-schema.json",
    "pacOwnerId": "< GUID HERE >",
    "pacEnvironments": [
        {
            "pacSelector": "epac-dev",
            "cloud": "AzureCloud",
            "tenantId": "< AZURE TENANT ID HERE >",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/< DEPLOYMENT ROOT SCOPE ID HERE >",
            "desiredState": {
                "strategy": "ownedOnly",
                "keepDfcSecurityAssignments": true
            },
            "globalNotScopes": [],
            "managedIdentityLocation": "eastus2"
        },
        {
            "pacSelector": "prod",
            "cloud": "AzureCloud",
            "tenantId": "< AZURE TENANT ID HERE >",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/< DEPLOYMENT ROOT SCOPE ID HERE >",
            "desiredState": {
                "strategy": "ownedOnly",
                "keepDfcSecurityAssignments": true
            },
            "globalNotScopes": [],
            "managedIdentityLocation": "eastus2"
        }
    ]
}
LarsVidingSE commented 1 month ago

Thanks no problem. Here is the used global-settings.jsonc

{
    "$schema": "https://raw.githubusercontent.com/Azure/enterprise-azure-policy-as-code/main/Schemas/global-settings-schema.json",
    "pacOwnerId": "< GUID HERE ",
    "pacEnvironments": [
        {
            "pacSelector": "epac-dev",
            "cloud": "AzureCloud",
            "tenantId": "< AZURE TENANT ID HERE >",
            "managedIdentityLocation": "westeurope",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/epacdev",
            "desiredState": {
                "strategy": "ownedOnly",
                "keepDfcSecurityAssignments": true
            }
        },
        {
            "pacSelector": "Alecta-Prod",
            "cloud": "AzureCloud",
            "tenantId": "< AZURE TENANT ID HERE >",
            "managedIdentityLocation": "westeurope",
            "deploymentRootScope": "/providers/Microsoft.Management/managementGroups/vdcroot",
            "globalNotScopes": [
                "/providers/Microsoft.Management/managementGroups/epac-dev"
            ],
            "desiredState": {
                "strategy": "ownedOnly",
                "keepDfcSecurityAssignments": true
            }
        }
    ]
}

I will use your template as it is, and see how it goes.

LarsVidingSE commented 1 month ago

I used your template and it is the same results.

Starting: Plan
==============================================================================
Task         : Azure PowerShell
Description  : Run a PowerShell script within an Azure environment
Version      : 5.242.0
Author       : Microsoft Corporation
Help         : https://aka.ms/azurepowershelltroubleshooting
==============================================================================
Generating script.
/usr/bin/pwsh -NoLogo -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command . '/home/vsts/work/_temp/49e3c026-b56c-4f9a-96c6-632c82f310ad.ps1'
File saved!
Import-Module -Name /usr/share/az_11.3.1/Az.Accounts/3.0.2/Az.Accounts.psd1 -Global
Clear-AzContext -Scope Process
Clear-AzContext -Scope CurrentUser -Force -ErrorAction SilentlyContinue
 Connect-AzAccount -ServicePrincipal -Tenant < AZURE TENANT ID HERE > -ApplicationId *** -FederatedToken ***** -Environment AzureCloud -Scope Process
WARNING: TenantId '< AZURE TENANT ID HERE >' contains more than one active subscription. First one will be selected for further use. To select another subscription, use Set-AzContext.
WARNING: To override which subscription Connect-AzAccount selects by default, use `Update-AzConfig -DefaultSubscriptionForLogin 00000000-0000-0000-0000-000000000000`. Go to https://go.microsoft.com/fwlink/?linkid=2200610 for more information.
WARNING: You're using Az version 11.3.1. The latest version of Az is 12.1.0. Upgrade your Az modules using the following commands:
  Update-PSResource Az -WhatIf    -- Simulate updating your Az modules.
  Update-PSResource Az            -- Update your Az modules.
There will be breaking changes from 11.3.1 to 12.1.0. Open https://go.microsoft.com/fwlink/?linkid=2241373 and check the details.

===================================================================================================
Read global settings from './Definitions/global-settings.jsonc'.
===================================================================================================
PowerShell Versions: 7.4.3
Write-Error: /home/vsts/.local/share/powershell/Modules/EnterprisePolicyAsCode/10.5.1/internal/functions/Select-PacEnvironment.ps1:12
Line |
  12 |  … lSettings = Get-GlobalSettings -DefinitionsRootFolder $DefinitionsRoo …
     |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Assignment JSON file './Definitions/global-settings.jsonc' is not valid.
##[error]PowerShell exited with code '1'.
/usr/bin/pwsh -NoLogo -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command . '/home/vsts/work/_tasks/AzurePowerShell_72a1931b-effb-4d2e-8fd8-f8472a07cb62/5.242.0/RemoveAzContext.ps1'
Disconnect-AzAccount -Scope CurrentUser -ErrorAction Stop
Disconnect-AzAccount -Scope Process -ErrorAction Stop
Clear-AzContext -Scope Process -ErrorAction Stop
Finishing: Plan
apybar commented 1 month ago

I am not sure why you would be getting this error as you're using the correct formatted JSON file.

I'm running the JSON you posted, ran it manually and in my ADO pipeline, and it's working without error.

Just to confirm, when running "Build-DeploymentPlans" (Which calls 'Select-PacEnvironment', which calls 'Get-GlobalSettings'), are you getting the same error?

The line of code that is failing is simply converting your JSON to a hashtable, which shouldn't be throwing the error if the JSON remains untouched when moving from manual deployment to pipeline deployment.

In the meantime, I will continue to test. (only way I can get this error currently is adding a "}" to my end of my global-settings.jsonc)

anwather commented 1 month ago

@LarsVidingSE have you tried running Build-DeploymentPlans locally?

LarsVidingSE commented 1 month ago

@anwather , Yes, I have run the Build-DeploymentPlans locally in my VS-code when connected to the tenant, And there is no error. This error is only when I run the ADO pipeline.

@apybar , I have locally in VS-code run the following.

I have controlled that the pipeline is checking-out the correct repo and branch (main). and that this branch have the used global-settings.jsonc file. Everything seems to be correct and in order.

anwather commented 1 month ago

@LarsVidingSE message me on teams at anwather @ microsoft dot com - we can have a call to sort this out

LarsVidingSE commented 1 month ago

@anwather Thanks for reaching out to me. This error is in my test environment (a separate tenant and ADO).

I started up and set up everything very carefully in our production environment. But when was ready to start the pipeline I stopped and did set up everything in my test environment. And this was not don with same care. Sorry for that. I have now controlled everything in the implementation and, sorry to say that I missed Azure roles for the three of the SPN.

Now the pipeline is runnning down to the last step [Plan tenant]. The code in the pipeline epac-dev-pipeline.yml for the stage [tenantPlan] have hardcoded pacEnvironmentSelector: tenant ? Is that a bug?

  - stage: tenantPlan
    displayName: "Plan tenant"
    dependsOn:
    - Deploy
    condition: and(not(failed()), not(canceled()))
    jobs:
      - job: Plan
        steps:
          - template: templates/plan.yml
            parameters:
              serviceConnection: $(planServiceConnection)
              pacEnvironmentSelector: tenant

The error from the last taget is

Starting: Plan
==============================================================================
Task         : Azure PowerShell
Description  : Run a PowerShell script within an Azure environment
Version      : 5.243.3
Author       : Microsoft Corporation
Help         : https://aka.ms/azurepowershelltroubleshooting
==============================================================================
Generating script.
/usr/bin/pwsh -NoLogo -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command . '/home/vsts/work/_temp/a86d87dc-e087-49ec-9431-e447337ba2be.ps1'
File saved!
Import-Module -Name /usr/share/az_11.3.1/Az.Accounts/3.0.2/Az.Accounts.psd1 -Global
Clear-AzContext -Scope Process
Clear-AzContext -Scope CurrentUser -Force -ErrorAction SilentlyContinue
 Connect-AzAccount -ServicePrincipal -Tenant 4e2c707f-0d20-4c1a-b0c6-790075ed2feb -ApplicationId *** -FederatedToken ***** -Environment AzureCloud -Scope Process
WARNING: TenantId '4e2c707f-0d20-4c1a-b0c6-790075ed2feb' contains more than one active subscription. First one will be selected for further use. To select another subscription, use Set-AzContext.
WARNING: To override which subscription Connect-AzAccount selects by default, use `Update-AzConfig -DefaultSubscriptionForLogin 00000000-0000-0000-0000-000000000000`. Go to https://go.microsoft.com/fwlink/?linkid=2200610 for more information.

===================================================================================================
Read global settings from './Definitions/global-settings.jsonc'.
===================================================================================================
PowerShell Versions: 7.4.4
PAC Environments: epac-dev, prod
PAC Owner Id: 11111111-2222-3333-4444-555555555555
Definitions root folder: ./Definitions
Input folder: ./Output
Output folder: ./Output

Write-Error: /home/vsts/.local/share/powershell/Modules/EnterprisePolicyAsCode/10.5.1/functions/Build-DeploymentPlans.ps1:63
Line |
  63 |  … vironment = Select-PacEnvironment $PacEnvironmentSelector -Definition …
     |                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Policy as Code environment selector tenant is not valid

##[error]PowerShell exited with code '1'.
/usr/bin/pwsh -NoLogo -NoProfile -NonInteractive -ExecutionPolicy Unrestricted -Command . '/home/vsts/work/_tasks/AzurePowerShell_72a1931b-effb-4d2e-8fd8-f8472a07cb62/5.243.3/RemoveAzContext.ps1'
Disconnect-AzAccount -Scope CurrentUser -ErrorAction Stop
WARNING: You're using Az version 11.3.1. The latest version of Az is 12.1.0. Upgrade your Az modules using the following commands:
  Update-PSResource Az -WhatIf    -- Simulate updating your Az modules.
  Update-PSResource Az            -- Update your Az modules.
There will be breaking changes from 11.3.1 to 12.1.0. Open https://go.microsoft.com/fwlink/?linkid=2241373 and check the details.
Disconnect-AzAccount -Scope Process -ErrorAction Stop
Clear-AzContext -Scope Process -ErrorAction Stop

Finishing: Plan
LarsVidingSE commented 1 month ago

Everything is now working, thanks for all help @anwather and @apybar

A short summary of my problem and findings. The main problem was that I missed to add Azure roles to the SPNs in my test environment. image

After this was solved, I got three other problems. 1, After I have added the missing Azure rules to the SPN, the pipeline didn’t success. But after I did delete the pipeline complete and created a new pipeline. The pipeline epac-dev-pipeline-yml did work, .... BUT. 2, In the end of epac-dev-pipeline-yml, the pacEnvironmentSelector was hardcoded with [tenant]. So, I had to replace it with my production pacEnviromentSelector [prod]. But the pipeline didn’t complete... When I deleted the pipeline again and started the pipeline for "the first time". Now the pipeline epac-dev-pipeline-yml completed with status success. 3, the last problem was when I created the pipeline epac-remediation-pipeline.yml, the validate complain about the pacEnvironmentSelector [epac-dev], because the use of dash (-) is not allowed. Allowed is underscore (_). Changed the name of the pacEnvironmentSelector in al files to epac_dev. And finally, everything is working

:)

LarsVidingSE commented 1 month ago

Everything is now working, thanks for all help @anwather and @apybar

A short summary of my problem and findings. The main problem was that I missed to add Azure roles to the SPNs in my test environment. image

After this was solved, I got three other problems. 1, After I have added the missing Azure rules to the SPN, the pipeline didn’t success. But after I did delete the pipeline complete and created a new pipeline. The pipeline epac-dev-pipeline-yml did work, .... BUT. 2, In the end of epac-dev-pipeline-yml, the pacEnvironmentSelector was hardcoded with [tenant]. So, I had to replace it with my production pacEnviromentSelector [prod]. But the pipeline didn’t complete... When I deleted the pipeline again and started the pipeline for "the first time". Now the pipeline epac-dev-pipeline-yml completed with status success. 3, the last problem was when I created the pipeline epac-remediation-pipeline.yml, the validate complain about the pacEnvironmentSelector [epac-dev], because the use of dash (-) is not allowed. Allowed is underscore (_). Changed the name of the pacEnvironmentSelector in al files to epac_dev. And finally, everything is working

:)