Closed james-c closed 4 years ago
From the Python CLI, the command to run remote scripts on a VM is az vm run
For VNET peering look at this ARM template
For running scripts as part of the deployment of a VM, look at custom script extensions (windows) or cloud init
Where some steps take a long time (e.g. installing software), we should consider splitting the build and deploy stages as we do for the Linux Compute VMs, building an image and storing it in an image Gallery.
Regarding internet access during setup and locking down the environment afterwards, look at what we do when deploying the Linux compute VMs. I think we programatically bind the VM to the locked down NSG to another.
In general, I think the ideal deployment model for all VMs (including the compute ones) would be:
@james-c I'm thinking the big picture for the end goal of this automation is to make sure that everything used in the DSG deployment relies only on scripts in source control. i.e. have all the scripts that run locally on deployed VMs in source control and then push them to the VMs on deployment and run them remotely with cloud-init
/ custom script extensions (or SCP + az vm run
if necessary).
@RobC-CTL Is there anything sensitive in the CreateADPDC.zip
folder in the RG_DSG_ARTIFACTS -> dsgxartifacts -> Blobs -> dsc
storage container? I'd like to move it into source control in this repo (which will eventually be public).
@martintoreilly Nothing sensitive
@RobC-CTL Just checking that the DSG DC, RDS and Dataserver zip files in the Scripts
folder of the RG_DSG_ARTIFACTS -> dsgxartifacts -> configpackages
share also son't have anything sensitive and can be added to source control.
@martintoreilly they are just PS scripts, there is mention of the domain name but other than that there isn't anything too sensitive.
@martintoreilly : is there anything left here that hasn't been captured in a dedicated issue?
Happy to close this. Lots of it is done, lots is captured in other issues, some no longer relevant. If any of what's left is important enough we'll think of it again.
@jemrobinson I'm not up to speed with the new label system. Is this part of our transition to a DevOps model? Let me know what I should be updating.
Target for April DSG 2019
Do later
Still to triage
[ ] idempotent scripting
[x] Secret Generation and Preparation, secrets in management tier already exist. Create / reuse secrets when needed in keyvault. Secrets generated in script (
[System.Web.Security.Membership]::GeneratePassword(20,0)
). Done in PR #174.[System.Web.Security.Membership]
is not available when using Powershell 6 with .Net Core on OSX so we copied and modified the[System.Web.Security.Membership]::GeneratePassword()
C# code in our own Powershell function.[ ] Scripts currently run on vms can be moved to run from local machine
[ ] Installation of software packages (can click-throughs be automated?)
[ ] Fully automate certificate generation / install (e.g. DNS record response etc). (see issue #203)
[ ] Post-install automated (potentially continuous) sanity check (validation of machines running)
[ ] Start / Stop / Tear down scripting - process tbd for tear down preparation
[ ] Monitoring scripting
[ ] Split management and DSG specific secrets into separate key vaults (query if this is needed)
[ ] Add remote desktop to HackMD and Gitlab boxes?
[x] Have single config files for safe haven parameters and each DSG parameters that all scripts load parameters from
[ ] Consider separating build and deployment configuration as we do for the compute VMs. Pros: pays the build / software installation cost once rather than once per DSG. Severs our dependence on external downloads. Cons: We won't be validating that we can still build our environment from scratch on each deployment.
[ ] If we can do all our deployment via the Azure management plane (i.e. Azure CLI / SDK commands), consider not setting up the VPN gateway until the very end. It's only necessary if we need to log into any of the boxes directly, so is ideally only needed for troubleshooting.
[x] Make Create_New_DSG_User_Service_Accounts script robust to failures in user creation
[x] Standardise KeyVault name and secret names (can we drop the test environment element)?
[x] Update dsgpu user to dsvm or similar
[x] Have a single config file to set key fields that is read by all scripts (see dsg9-test.yml for starter for 10)
[ ] Give the various VMs read access to the "artifacts" storage account via Azure Active Directory authentication over SMB. Ideally give then read access to this repository and store the setup scripts here. The installers for the various apps are too large for Github, but we should be able to add download and installation steps to the VM setup script for the RDS server.
[ ] Automate renewal of Lets Encrypt for RDS SSL certs (and other SSL certs as required).
[ ] Consider whether we want a single pool of people with RDP access to the management gateway and the DSG gateways (probably yes - note that we are storing all DSG secrets in a single KeyVault in the management subscription, which makes this a single trust pool from a secrets management perspective)
[ ] Make scripts recoverable so that re-running after failure results in same deployment state as running without error while avoiding deleting and re-creating resources that have successfully deployed (we will probably need to explicitly track successful deployment of each resource and what "in progress" resources we need to delete on a re-run).
From Validate Feb 2019 Azure runbook #174
[x] Update the compute VM deployment scripts to use the convention:
ldap-dsg<X>-<environment>-<resource-type>
e.g.ldap-dsg9-test-dsgpu
for LDAP secret names (should be done as part of #176)[ ] Add scripting for password protecting .pfx certificate when downloading (see: https://coombes.nz/blog/azure-keyvault-export-certificate/)
[x] Add a page to the runbook describing how to set up the subscription and what quotas to request for a subsubscription being used for test or production. See issue #125
[x] We had some issues with HackMD loading slowly due to trying to use a content distribution network (CDN) for pulling down javascript, style sheets etc. See issue #59 for fix for this. I'm not sure if this fix has been back ported to the HackMD deployment scripts / installer package. Verified in manual instructions in PR #174. Validated in automated setup in PR #239 (merged into PR #249)
[x] Ensure we install LaTeX + editor on report writing windows server (done in PR #174).
[ ] The P2S RootCert Public/Private Key pair should be different for each DSG (and the management segment). Add a section covering how to make a self-signed cert, upload this to the KeyVault and make new Client Certs.
[ ] Consider whether we should have different LDAP username and passwords for each compute VM instance (GitLab and HackMD have their own)? Pro: no shared secrets across VMs; Con: need to access management DC as admin to add new LDAP user, rather than just needing permission to deploy a new VM.
To-dos for future versions
Ensure each of these is captured as an issue.
[ ] Make
Create_New_DSG_User_Service_Accounts
script robust to failures in user creation[x] Standardise KeyVault name and secret names (can we drop the
test
environment element)?[x] Update
dsgpu
user todsvm
or similar[x] Have a single config file to set key fields that is read by all scripts (see
dsg9-test.yml
for starter for 10)[ ] Give the various VMs read access to the "artifacts" storage account via Azure Active Directory authentication over SMB. Ideally give then read access to this repository and store the setup scripts here. The installers for the various apps are too large for Github, but we should be able to add download and installation steps to the VM setup script for the RDS server.
[ ] Automate renewal of Lets Encrypt for RDS SSL certs (and other SSL certs as required).
[ ] Consider whether we want a single pool of people with RDP access to the management gateway and the DSG gateways (probably yes - note that we are storing all DSG secrets in a single KeyVault in the management subscription, which makes this a single trust pool from a secrets management perspective)
[ ] Make scripts recoverable so that re-running after failure results in same deployment state as running without error while avoiding deleting and re-creating resources that have successfully deployed (we will probably need to explicitly track successful deployment of each resource and what "in progress" resources we need to delete on a re-run).