Azure / bicep-registry-modules

Bicep registry modules
MIT License
471 stars 325 forks source link

[AVM Module Issue]: Unable to set licenseType to RHEL_BYOS for linux VMs #1676

Closed oramoss closed 5 months ago

oramoss commented 5 months ago

Check for previous/existing GitHub issues

Issue Type?

Feature Request

Module Name

avm/res/compute/virtual-machine

(Optional) Module Version

0.2.3

Description

For Linux VMs that use Bring Your Own Subscription, we can't set licenseType to "RHEL_BYOS" for these images because the input parameter only allows Windows values (Windows_Client / Windows_Server) or an empty string.

Need to add "RHEL_BYOS" as an allowed value.

(Optional) Correlation Id

No response

github-actions[bot] commented 5 months ago

@oramoss, thanks for submitting this issue for the avm/res/compute/virtual-machine module!

A member of the @azure/avm-res-compute-virtualmachine-module-owners-bicep or @azure/avm-res-compute-virtualmachine-module-contributors-bicep team will review it soon!

rahalan commented 5 months ago

@oramoss thanks for bringing that up. I will investigate

oramoss commented 5 months ago

I'm doing a lot with AVM recently and I've read this article https://johnlokerse.dev/2024/04/10/your-first-contribution-to-azure-verified-modules/ from John Lokerse about contributing...so I'd like to do the fix if I can...if it's as simple as just adding the value to the allowed list on the parameters, of course.

rahalan commented 5 months ago

I'm doing a lot with AVM recently and I've read this article https://johnlokerse.dev/2024/04/10/your-first-contribution-to-azure-verified-modules/ from John Lokerse about contributing...so I'd like to do the fix if I can...if it's as simple as just adding the value to the allowed list on the parameters, of course.

@oramoss That would be awesome. Please read the contribution guide. Basically you need to create your own fork and test environment. After the change you might also need to regenerate the readme file. Feel free to ping me, if you need help.

oramoss commented 5 months ago

I forked it, changed the parameter to allow RHEL_BYOS and then ran the GitHub (after creating a Subscription+RG in my tenant for deployment testing).

I get this:

image

image

image

image

Am I missing something? I can continue to debug but I'm guessing you will have seent his before and can point me quicker...

oramoss commented 5 months ago

When I try to run the Test locally, with:

Install-Module -Name Pester -Force $folder = "C:/data/code/bicep-registry-modules" . $folder/avm/utilities/tools/Set-AVMModule.ps1 . $folder/avm/utilities/tools/Test-ModuleLocally.ps1 $TestModuleLocallyInput = @{ TemplateFilePath = "$folder/avm/res/compute/virtual-machine/main.bicep" ModuleTestFilePath = "$folder/avm/res/compute/virtual-machine/tests/e2e/windows-defaults/main.test.bicep" PesterTest = $true ValidationTest = $false DeploymentTest = $false ValidateOrDeployParameters = @{ Location = 'uksouth' SubscriptionId = 'XXXXXXXXXXX' RemoveDeployment = $true } AdditionalTokens = @{ namePrefix = 'oramoss' TenantId = 'YYYYYYYYYYYYY' } } $TestModuleLocallyInput.ModuleTestFilePath = "$folder/avm/res/compute/virtual-machine/tests/e2e/windows-defaults/main.test.bicep" Test-ModuleLocally @TestModuleLocallyInput

...it runs through fine with no failures....

image

rahalan commented 5 months ago

@oramoss please add both missing Linux values: 'RHEL_BYOS' and 'SLES_BYOS'

rahalan commented 5 months ago

@oramoss the name prefix needs to be a secret in your GH, see 3.1 https://azure.github.io/Azure-Verified-Modules/contributing/bicep/bicep-contribution-flow/#31-set-up-secrets

oramoss commented 5 months ago

Name prefix in secrets worked fine - it builds stuff in my tenant now. I get several failures, some of which were down to me - my Tenant is laid out using Cloud Adoption Framework and there were lots of policies in the way, so I moved to a Sandbox Subscription and that fixed it.

I now get 6 of the 10 deployments to run successfully.

image

The 4 failures occur because: linux-max - complains of authorisation but the Service principal has Owner + User Access Administrator and works fine on all others. Error:

Exception: /home/runner/work/_temp/dc66c607-7139-4670-939c-8c68d0c69c33.ps1:56 Line | 56 | throw $res.exception | ~~~~ | 16:25:08 - The deployment 'a-r-c-vm-linux.max-t3-20240417T1604335990Z' | failed with error(s). Showing 1 out of 1 error(s). Status Message: The | template deployment failed with error: 'Authorization failed for | template resource '09b10464-d5db-5376-85c6-d567fca004a2' of type | 'Microsoft.Authorization/roleAssignments'. The client | '232547db-xxxx' with object id | '232547db-xxxx' does not have permission to | perform action 'Microsoft.Authorization/roleAssignments/write' at scope | '/subscriptions//resourceGroups/dep--compute.virtualMachines-cvmlinmax-rg/providers/Microsoft.Compute/virtualMachines/***cvmlinmax/providers/Microsoft.Authorization/roleAssignments/09b10464-xxxx'.'. (Code:InvalidTemplateDeployment) CorrelationId: c99b6e6b-1062-4dd9-96f5-15ddd117fcb0

waf-aligned - authorisation again...same error as above... windows-max - authorisation again...same error as above... windows.nvidia - quotas issue - I have none (and can't increase from zero for whatever reason) for the relevant SKU:

Exception: /home/runner/work/_temp/5e417921-511e-4f42-a279-532a7e0141e7.ps1:56 Line | 56 | throw $res.exception | ~~~~ | 06:44:36 - The deployment | 'a-r-c-vm-windows.nvidia-t3-20240417T0604064967Z' failed with error(s). | Showing 1 out of 1 error(s). Status Message: The template deployment | 'ujcbnkut5nstm-test-cvmwinnvidia-init' is not valid according to the | validation procedure. The tracking id is | 'a491e768-xxxx'. See inner errors for details. | (Code: InvalidTemplateDeployment) - Operation could not be completed | as it results in exceeding approved StandardNVADSA10v5Family Cores | quota. Additional details - Deployment Model: Resource Manager, | Location: eastus, Current Limit: 0, Current Usage: 0, Additional | Required: 6, (Minimum) New Limit Required: 6. Submit a request for Quota | increase at

Given that the other 6 work fine, I fail to see how the Service Principal doesn't have the right privileges for the first 3.

For the Nvidia one, I simply don't have the capability to run that one to success due to constraints I can't control.

Thoughts?

rahalan commented 5 months ago

@oramoss The issue is that the Azure Backup Service GUID is tenant specific. You need to find out yours and replace it. Mind you not to forget to change it back, after the tests.

oramoss commented 5 months ago

Ok - great - that GUID change fixed the 3 failures at the top but still leaves the nvidia one.

Ultimately, that one requires a quota of nvidia SKU and I don't have it and can't obtain it....so I'm not sure what I can do here...

oramoss commented 5 months ago

Right, so for Nvidia one I changed the SKU use from Standard_NV6ads_A10_v5 to Standard_NV4as_v4 and it worked:

image

So, are you saying I now need, on my feature branch, to undo the backup service GUID change and this new NVIDIA SKU change and then do PR from my feature branch to your main?

rahalan commented 5 months ago

@oramoss yes, please create a PR and assign it to me

oramoss commented 5 months ago

Created PR. I'm not normally a GitHub user and this is my first PR to this repo and more generally opensource. Not sure how you assign to you @rahalan - doesn't seem to be an option for me...

rahalan commented 5 months ago

@oramoss thanks for your contribution. PR is merged

oramoss commented 5 months ago

Excellent. I'll go an have a quiet sit down now :-)