Azure / azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
MIT License
5.36k stars 4.78k forks source link

[BUG] AutoPools Fail to Scale Up and Gives "Application license is blocked as the support for application license is retired as of 02/29/2024" #44857

Open scottbright22 opened 3 months ago

scottbright22 commented 3 months ago

Library name and version

Microsoft.Azure.Batch 16.2.0

Describe the bug

Pools Create programmatically from an AutoPool Job Fail to scale up and give the error: “Application license is blocked as the support for application license is retired as of 02/29/2024”

Pools not created as an autopool work as expected.

No licenses or applications are used by our application.

Both Fixed and AutoScale Pools behave the same.

Expected behavior

Pool Scales up and adds a node.

Actual behavior

Pool fails to add a node. When done in the portal gives "Application license is blocked as the support for application license is retired as of 02/29/2024"

image

Reproduction Steps

    Try
        Dim _pi = New PoolInformation
        _pi.AutoPoolSpecification = New AutoPoolSpecification()
        _pi.AutoPoolSpecification.KeepAlive = False
        _pi.AutoPoolSpecification.PoolLifetimeOption = PoolLifetimeOption.Job

        _pi.AutoPoolSpecification.PoolSpecification = New PoolSpecification
        _pi.AutoPoolSpecification.PoolSpecification.VirtualMachineSize = "standard_f16s_v2"
        _pi.AutoPoolSpecification.PoolSpecification.TaskSlotsPerNode = 1
        _pi.AutoPoolSpecification.PoolSpecification.TaskSchedulingPolicy = New TaskSchedulingPolicy(ComputeNodeFillType.Pack)

        Dim _cj As Microsoft.Azure.Batch.CloudJob = Nothing
        _cj = CMTest.BatchAccountClient.JobOperations.CreateJob()
        _cj.Id = "fredJob1"
        _cj.PoolInformation = New PoolInformation
        _cj.PoolInformation = _pi

        Dim _ir As New ImageReference("windowsserver", "microsoftwindowsserver", "2019-datacenter")
        Dim _vmc As New VirtualMachineConfiguration(_ir, "batch.node.windows amd64") 'Do we need to set Batch Node Agent SKU ID?
        _vmc.LicenseType = ""
        _vmc.NodePlacementConfiguration = New NodePlacementConfiguration(NodePlacementPolicyType.Regional)
        _vmc.OSDisk = New OSDisk
        _vmc.OSDisk.ManagedDisk = New ManagedDisk
        _vmc.OSDisk.ManagedDisk.StorageAccountType = StorageAccountType.PremiumLrs

        _cj.PoolInformation.AutoPoolSpecification.PoolSpecification.VirtualMachineConfiguration = _vmc

        Dim _useAutoScale As Boolean = False
        If _useAutoScale = True Then
            _cj.PoolInformation.AutoPoolSpecification.PoolSpecification.AutoScaleEnabled = True
            Dim Interval As New TimeSpan(0, 15, 0)  'h,m,s Setting
            _cj.PoolInformation.AutoPoolSpecification.PoolSpecification.AutoScaleEvaluationInterval = Interval

            _cj.PoolInformation.AutoPoolSpecification.PoolSpecification.AutoScaleFormula = "$TargetDedicatedNodes = 1;" & vbCrLf & "$TargetLowPriorityNodes = 2;" & vbCrLf
        Else
            _cj.PoolInformation.AutoPoolSpecification.PoolSpecification.TargetDedicatedComputeNodes = 1
            _cj.PoolInformation.AutoPoolSpecification.PoolSpecification.TargetLowPriorityComputeNodes = 2
        End If

        _cj.Commit()
        AZB_Status_Message("*** Auto Pool Job Created ************************")

        Dim _testCj = CMTest.BatchAccountClient.JobOperations.GetJob(_cj.Id)

        If _debugRun = True Then
            CMTest.BatchAccountClient.JobOperations.DeleteJob(_cj.Id)
        End If

    Catch ex As Exception
        AZB_Status_Message("Auto Pool Job Creation Failed. [" & " " & "]" & GetExceptionMsg(ex, GetMethodName))

    End Try

image

Environment

.Net 4.8 Visual Studio 22 (Version 17.10.3)

github-actions[bot] commented 3 months ago

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @dpwatrous @wiboris.

bmcandr commented 2 months ago

I am also affected by this issue, but I am using the Python SDK to manage Batch resources.

I am creating autopools using a modified version of the pending tasks scaling formula from the Azure Batch docs:

startingNumberOfVMs = 1;
maxNumberofVMs = {max_nodes};
pendingTaskSamplePercent = $PendingTasks.GetSamplePercent(180 * TimeInterval_Second);
pendingTaskSamples = pendingTaskSamplePercent < 70 ? startingNumberOfVMs : avg($PendingTasks.GetSample(180 * TimeInterval_Second));
$slotsPerVM = {task_slots_per_node};
$TargetLowPriorityNodes=min(maxNumberofVMs, (pendingTaskSamples / $slotsPerVM) + 1);
$NodeDeallocationOption = taskcompletion;

This formula works with regular node pools in the same Batch account.

Autopools using this formula fail to scale beyond 1 node in response to the accumulation of pending tasks, however. I have attempted to trigger a pool resize event by using Azure Batch Explorer to modify the maxNumberofVMs value in the scaling formula, but that results in the following vague Internal Error message:

Screenshot 2024-07-29 at 15 26 11

Request ID: f68d2884-2543-4cb1-9388-de5c7839d089

Similarly, using Azure Batch Explorer to modify the resize configuration to a fixed size results in the license-related

Screenshot 2024-07-29 at 15 19 32

Request ID: 6917061e-0fa5-4595-b2c3-293466ee8387

In Diagnostic Logs I can see that the scaling formula is being evaluated. For example, here is the reported result in a PoolAutoScaleEvent for a test autopool that uses the scaling formula above:

$TargetDedicatedNodes=0;
$TargetLowPriorityNodes=5;
$NodeDeallocationOption=taskcompletion;
$slotsPerVM=7;
maxNumberofVMs=200;
pendingTaskSamplePercent=100;
pendingTaskSamples=34.5;
startingNumberOfVMs=1

The $TargetLowPriorityNodes field is set to 5, but the node pool is not resized to match this value. There are no PoolResizeStartEvents in the Diagnostic Logs related to these PoolAutoScaleEvents.

bmcandr commented 2 months ago

Is this issue related to this notice somehow?

Batch pools can currently be created using Marketplace VM images containing pre-installed graphics and rendering applications that have pay-for-use application licensing. These VM images and the pay-for-use licensing will not be available for use starting 29 February 2024.

FWIW, I have not encountered the scaling issue in regular node pools that use the same VM configuration:

Screenshot 2024-07-29 at 16 11 19

scottbright22 commented 2 months ago

Non-AutoPool jobs did not have the issue for us either. We opened an Azure issue and our agent tells us a fix is going in any day now...