Azure / Mission-Critical-Online

This repository is part of the Azure Mission-Critical open source project that provides a prescriptive architectural approach to building highly-reliable cloud-native applications on Microsoft Azure for mission-critical workloads. This repository contains the online reference implementation a fully functional production-grade reference Implementation.
MIT License
384 stars 102 forks source link

Spike - IaaS-based Compute #593

Open heoelri opened 1 year ago

heoelri commented 1 year ago

This documents replacing AKS with IaaS as the compute platform used for Azure Mission-Critical. Some specific scenarios might require the use of IaaS VMs instead of PaaS services. Potential reasons are:

Changes required compared to Mission-Critical-Online:

Scenarios to address:

Open questions / findings:

Recommendations:

heoelri commented 1 year ago

Using VMs instead of containers or PaaS services like AppSvc (with or without Containers) requires us to develop and implement a new application build, packaging and installation process to bring our application code (i'd stick to the sample catalogservice application we already have for now) onto the frontend and backend virtual machines.

Downloading the source and building the application on demand when starting a VM(SS) instance is from my POV not a viable option as this would take to long, is potentially error prone and could lead to varying results.

My idea is to replace the existing container build/push (to ACR) task with a VM specific one. This process could build (dotnet publish) the application code, for example self-contained and singlefile for a certain architecture, for example linux, package it into a tar.gz file (for linux) and push it to a storage account. This SA would act as a (private) repository.

- task: AzureCLI@2
  displayName: 'Build and package ${{ parameters.componentName }}'
  retryCountOnTaskFailure: 1
  inputs:
    workingDirectory: ${{ parameters.workingDirectory }}
    azureSubscription: $(azureServiceConnection)
    scriptType: pscore
    scriptLocation: inlineScript
    inlineScript: |

      dotnet publish ${{ parameters.componentName }} `
        -r ${{ parameters.targetPlatform }} `
        -p:PublishSingleFile=true `
        --self-contained:true `
        -o output

      tar -czf  ${{ parameters.componentName }}-$(Build.BuildId)-${{ parameters.targetPlatform }}.tar.gz output

      az storage blob upload -f ${{ parameters.componentName }}-${{ parameters.targetPlatform }}.tar.gz `
          --container-name applications `
          --name ${{ parameters.componentName }}-$(Build.BuildId)-${{ parameters.targetPlatform }}.tar.gz `
          --account-name $(global_storage_account_name) `
          --auth-mode login

This builds (dotnet publish) the appliocation code, archieves it into a *.tar.gz file and uploads it to a global storage account.

image

From there we can pull it, in a specific version into the VM(SS) instances for example via a custom script extension or via cloud-init.

image

CC: @sebader; @msimecek for feedback.

sebader commented 1 year ago

This looks all pretty good to me already! One thing I would like to throw in: Using VMSS for horizontal scaling is almost(...) going towards a more cloud-native approach. Which is great when you can use it. But IaaS workloads in my experience often contain some workload that does not work with such an approach with dynamic scale out etc. Often enough customers need to use VMs because the workload they need to run needs some actual installation process on one or more VMs which cannot be scaled in or out dynamically. So how about the following: We use the VMSS-based approach for either one, frontend or backend. For the other we use VMs (maybe in an Availability Set?) and try to mimic some kind of installation process during deployment. This way we can show both approaches. Thoughts?

heoelri commented 1 year ago

This looks all pretty good to me already! One thing I would like to throw in: Using VMSS for horizontal scaling is almost(...) going towards a more cloud-native approach. Which is great when you can use it. But IaaS workloads in my experience often contain some workload that does not work with such an approach with dynamic scale out etc. Often enough customers need to use VMs because the workload they need to run needs some actual installation process on one or more VMs which cannot be scaled in or out dynamically. So how about the following: We use the VMSS-based approach for either one, frontend or backend. For the other we use VMs (maybe in an Availability Set?) and try to mimic some kind of installation process during deployment. This way we can show both approaches. Thoughts?

Yes, I think that's a good idea. And I agree that customers who have to stick to VMs due to legacy workloads probably struggle to use VMSS. Using VMSS for FE and VMs for BE (or vice versa) would allow us to address both scenarios.

image