ecalder6 / azure-gaming

Cloud Gaming Made Easy
MIT License
268 stars 79 forks source link

Integrate the official Azure Nvidia GPU extension into the template #41

Closed AmineI closed 4 years ago

AmineI commented 4 years ago

Integrates the Nvidia Driver extension. Should close #36. Since the Nvidia extension reboots after installing, our setup script should not need to anymore.

I've thus simplified the workflow by merging the two scripts with a reboot in between and at the end, into one single script with a reboot at the end.

I also took that opportunity to make a few enhancements to bring the script up to date with the current Steam Remote Play changes.

- [ ] Clean unused functions from utils.ps1 ?

AmineI commented 4 years ago

A small question about the LowPri template : Does it still work ? I know Azure got rid of LowPri VMs in favor of Azure Spot, and I haven't been able to deploy the LowPri template even once despite multiple tries - always failed on me (azureskunotavailable or something around the lines). Did you succeed to do so ?

ecalder6 commented 4 years ago

Hmm I can't deploy a LowPri either. Seems like the change to Spot broke this. I don't think it's worth fixing it (if it's even possible). Exploring NVv4 might be a more interesting alternative to cost saving and (potentially) better performance. Half of M60 is no where near top of the line performance nowadays :(

andiradulescu commented 4 years ago

Standard_NV6_Promo isn't supported in LowPri. Is this the issue? Did you try with Standard_NV6?

andiradulescu commented 4 years ago

After this PR gets merged, I might explore myself adding NVv4, but I'm thinking of creating a separate template for it. What do you think?

Regarding the half of M60, even in 2020 on my scenario, it suits me fine, but i'm curious about NVv4.

What about NVv3? Do we skip it? I would create a template for NVv3 also, besides NVv4.

AmineI commented 4 years ago

Standard_NV6_Promo isn't supported in LowPri. Is this the issue? Did you try with Standard_NV6?

Yes, I checked the regions and VMs available on my end from the Azure Portal to double check that it was not an issue with the region or VM size, but no luck deploying a Standard_NV6. Did you succeed to do so ?

Hmm I can't deploy a LowPri either. Seems like the change to Spot broke this. I don't think it's worth fixing it (if it's even possible).

Agreeing on this one. I'm not a fan of duplicate code/templates, so if no one is able to deploy a LowPri, it will help with maintainability to only keep a single template.

Exploring NVv4 might be a more interesting alternative to cost saving and (potentially) better performance.

There's an AMD driver extension available, so it may be possible to add a condition to choose which driver extension to add before our setup script, while still keeping a single template.

AmineI commented 4 years ago

What about NVv3? Do we skip it? I would create a template for NVv3 also, besides NVv4.

I know NV6 does not support accelerated networking, NVv3 would bring this advantage if it does. We're going a bit out of scope for this PR though so we could discuss this in a new issue perhaps ?

AmineI commented 4 years ago

So, last confirmations so that I can complete the PR :

andiradulescu commented 4 years ago

@AmineI - I tried deploying a LowPri again (Standard_NV6), on West Europe, and it works. I tried both with my fork and with the main repo. Also, keep in mind that not all Azure Subscriptions support everything - there are limitations from a subscription to another.

AmineI commented 4 years ago

@AmineI - I tried deploying a LowPri again (Standard_NV6), on West Europe, and it works.

I just retried, and it won't work on my end.. But you're correct, it may be my subscription. Thanks for the confirmation ! I just setup a branch to integrate the driver extension on LowPri. If it works for someone, I'll merge these LowPri changes here

The template should be fine, as the issue I'm getting is still azureskunotavailable , but it'd be great if you could test it out and report. If the Nvidia Driver and Steam are installed, it should be enough to confirm the behavior of LowPri correctly follows the Standard one. The extensions status would help, too ! image

ecalder6 commented 4 years ago
  • I didn't delete any functions from the utils module. Is it okay with you ? I can delete the now unused ones but they may prove useful later, for example when adding the AMD VMs, or if changes happen to the Nvidia Extension.

It's fine leaving it as is for now. Once we are happy with the user flows we can refactor utils.

ecalder6 commented 4 years ago

Lmk when you are ready to merge. I'll test it out and merge if everything works.

AmineI commented 4 years ago

Lmk when you are ready to merge. I'll test it out and merge if everything works.

Sure thing. I'm testing the Edit-VisualEffects part in more depth right now. Once I confirm it works as intended I'll mark the PR as ready, although I would have liked to add LowPri too

AmineI commented 4 years ago

I'm back ! That last one sure was though. Basically, when Edit-VisualEffectsRegistry fails, the "Let Windows chose what's best for my computer" option stays enabled. But here, something causes Windows to apply some performance options, but not all of them. Perhaps it's the auto login or the disabled services. Regardless, this new approach works for all users and "adjust for best performance" properly appears enabled.

Ready to merge. I updated LowPri because it should work, and the template is valid, but if @andiradulescu could test it and confirm it'd be awesome

ecalder6 commented 4 years ago

I tested standard and everything looks good. I'm still not able to deploy low-pri so I'd appreciate it if @andiradulescu could give it a test.

andiradulescu commented 4 years ago

@ecalder6 @AmineI - I tried deploying LowPri but this error came up:

{ "code":"DeploymentFailed", "message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.", "details":[ { "code":"BadRequest", "message":"{\r\n \"error\": {\r\n \"details\": [\r\n {\r\n \"message\": \"Could not find member 'apiVersion' on object of type 'VMScaleSetVMExtension'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[0].apiVersion', line 1, position 1446.\",\r\n \"target\": \"vmss.properties.virtualMachineProfile.extensionProfile.extensions[0].apiVersion\"\r\n },\r\n {\r\n \"message\": \"Could not find member 'mode' on object of type 'VMScaleSetVMExtensionProperties'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[0].properties.mode', line 1, position 1521.\",\r\n \"target\": \"vmss.properties.virtualMachineProfile.extensionProfile.extensions[0].properties.mode\"\r\n },\r\n {\r\n \"message\": \"Could not find member 'templateLink' on object of type 'VMScaleSetVMExtensionProperties'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[0].properties.templateLink', line 1, position 1550.\",\r\n \"target\": \"vmss.properties.virtualMachineProfile.extensionProfile.extensions[0].properties.templateLink\"\r\n },\r\n {\r\n \"message\": \"Could not find member 'parameters' on object of type 'VMScaleSetVMExtensionProperties'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[0].properties.parameters', line 1, position 1691.\",\r\n \"target\": \"vmss.properties.virtualMachineProfile.extensionProfile.extensions[0].properties.parameters\"\r\n },\r\n {\r\n \"message\": \"Could not find member 'dependsOn' on object of type 'VMScaleSetVMExtension'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[0].dependsOn', line 1, position 1773.\",\r\n \"target\": \"vmss.properties.virtualMachineProfile.extensionProfile.extensions[0].dependsOn\"\r\n },\r\n {\r\n \"message\": \"Could not find member 'dependsOn' on object of type 'VMScaleSetVMExtension'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[1].dependsOn', line 1, position 2250.\",\r\n \"target\": \"vmss.properties.virtualMachineProfile.extensionProfile.extensions[1].dependsOn\"\r\n }\r\n ],\r\n \"code\": \"BadRequest\",\r\n \"message\": \"The request message is invalid.\"\r\n }\r\n}" } ] }

I deployed it from here: https://portal.azure.com/#create/Microsoft.Template/uri/https%3A%2F%2Fraw.githubusercontent.com%2FAmineI%2Fazure-gaming%2FGPU-extension%2FLowPri.json

And then set the Script Location to: https://raw.githubusercontent.com/AmineI/azure-gaming/GPU-extension

AmineI commented 4 years ago

Thanks a lot for helping test the changes ! I took note of these results and LowPri.json should be good to go now, hopefully.

andiradulescu commented 4 years ago

@AmineI - this time another error:

{
    "error": {
        "code": "BadRequest",
        "message": "Could not find member 'provisionAfterExtensions' on object of type 'VMScaleSetVMExtensionProperties'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[1].properties.provisionAfterExtensions', line 1, position 1646.",
        "target": "vmss.properties.virtualMachineProfile.extensionProfile.extensions[1].properties.provisionAfterExtensions"
    }
}
AmineI commented 4 years ago

@AmineI - this time another error:

{
    "error": {
        "code": "BadRequest",
        "message": "Could not find member 'provisionAfterExtensions' on object of type 'VMScaleSetVMExtensionProperties'. Path 'properties.virtualMachineProfile.extensionProfile.extensions[1].properties.provisionAfterExtensions', line 1, position 1646.",
        "target": "vmss.properties.virtualMachineProfile.extensionProfile.extensions[1].properties.provisionAfterExtensions"
    }
}

Thanks again for your help.

Well, now I'm confused, because the docs clearly mention provisionAfterExtensions and I can't find what kind of dumb mistake I could have done.

EDIT : Of course, API Version. Fixing it as we speak

AmineI commented 4 years ago

Should be good to go. I'm trying alternatives so that we won't have to bother you anymore with such testing. Thanks again !

andiradulescu commented 4 years ago

@AmineI - no worries. this time: "VM scale set cannot use a basic SKU load balancer when singlePlacementGroup property is false (large scale enabled), or when the VMSS spans multiple availability zones. Please use standard SKU load balancer instead. Click here for details"

{"code":"DeploymentFailed","message":"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/DeployOperations for usage details.","details":[{"code":"BadRequest","message":"{\r\n \"error\": {\r\n \"details\": [],\r\n \"code\": \"VMScaleSetCannotReferenceLoadbalancerWhenLargeScaleOrCrossAZ\",\r\n \"message\": \"VM scale set cannot use a basic SKU load balancer when singlePlacementGroup property is false (large scale enabled), or when the VMSS spans multiple availability zones. Please use standard SKU load balancer instead.\"\r\n }\r\n}"}]}

AmineI commented 4 years ago

Okay, let's go another route. Could you try #42 (:P), please ?

AmineI commented 4 years ago

I'm not confortable enough with Vm Scale Sets to update the current LowPri template, especially considering my subscription does not allow me to test my changes myself.

Work on LowPri was moved to #42, with another approach eliminating VM Scale Sets. The PRs are decoupled, but the fact that this one does not add the Nvidia extension to LowPri breaks LowPri deployments until #42 is merged.

I marked my comments related to LowPri as outdated to collapse them by default, as they are no longer in scope for this PR.

Doing the same for andiradulescu's comments would declutter a bit as well. I'll rebase to get rid of the LowPri changes on this PR.


EDIT : Rebased, ready to merge. The first post is up to date regarding the PR descriptions and tasks.