adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
86 stars 101 forks source link

Vagrant: Explore adding a Windows 10 Vagrantfile #1883

Open Willsparker opened 3 years ago

Willsparker commented 3 years ago

Spotted by @sxa , Microsoft offers Vagrant boxes at https://developer.microsoft.com/en-us/microsoft-edge/tools/vms/

Some preliminary testing has show that it's fairly easy to retrieve the box and boot the Vagrant box by doing the following:

wget https://az792536.vo.msecnd.net/vms/VMBuild_20190311/Vagrant/MSEdge/MSEdge.Win10.Vagrant.zip
unzip MSEdge.Win10.Vagrant.zip
vagrant box add --name windows10 'MSEdge - Win10.box'

# Change in the Vagrantfile:
# adoptopenjdkW2012.vm.box = "mwrock/Windows2012R2"  --> adoptopenjdkW2012.vm.box = "windows10"

vagrant up

Some small changes will need to be made to VPC to let this run, as the box doesn't have a vagrant user and the password is Passw0rd! , not vagrant.

Willsparker commented 3 years ago

My mistake, just directly transferring windows10 into the adoptopenjdkW2012.vm.box field, doesn't work out of the box.

    adoptopenjdkW2012: WinRM address: 127.0.0.1:55985
    adoptopenjdkW2012: WinRM username: IEUser
    adoptopenjdkW2012: WinRM execution_time_limit: PT2H
    adoptopenjdkW2012: WinRM transport: negotiate
Timed out while waiting for the machine to boot. This means that
Vagrant was unable to communicate with the guest machine within
the configured ("config.vm.boot_timeout" value) time period.

If you look above, you should be able to see the error(s) that
Vagrant had when attempting to connect to the machine. These errors
are usually good hints as to what may be wrong.
Willsparker commented 3 years ago

However, once vagrant up has timed out, I am still able to run vagrant winrm -c commands, meaning the machine is still booting and the winrm connection is working as intended. However, it is taking long than I have seen vagrant winrm commands take before. i.e:

$ time vagrant winrm -c "ipconfig"

Windows IP Configuration

...

real    0m16.959s
user    0m1.856s
sys 0m0.255s

On the off-chance that vagrant up is just timing out, I'm going to try bumping the timeout up to 10 minutes (default is 4 minutes).

Willsparker commented 3 years ago

Oh..... yeah, increasing the timeout worked! That's surprising. This is the Vagrantfile I managed to get it working with:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Runs powershell as an administator and gets/executes an Ansible provided script that configures WinRM to allow Ansible to communicate over it. Then places a file in the shared folder of the VM that contains the IP address of the VM.
$script = <<SCRIPT
Start-Process powershell -Verb runAs
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
wget https://raw.githubusercontent.com/ansible/ansible/devel/examples/scripts/ConfigureRemotingForAnsible.ps1 -OutFile .\\ConfigureRemotingForAnsible.ps1
.\\ConfigureRemotingForAnsible.ps1 -CertValidityDays 9999
.\\ConfigureRemotingForAnsible.ps1 -EnableCredSSP
.\\ConfigureRemotingForAnsible.ps1 -ForceNewSSLCert
.\\ConfigureRemotingForAnsible.ps1 -SkipNetworkProfileCheck
# Retrieving disk's current size
$currentDiskSize =(Get-Partition -DriveLetter c | select Size)
$currentDiskSize =($currentDiskSize -replace "[^0-9]" , "")
# The size the disk should be, in bytes (95GB)
$diskSizeBoundary = 102005473280
# Changing the disksize to max supported size (~100GB)
if ([long]$currentDiskSize -lt $diskSizeBoundary) {
        echo "Resizing disk to max size"
        $size = (Get-PartitionSupportedSize -DriveLetter c); Resize-Partition -DriveLetter c -Size $size.SizeMax
}else {
        echo "Disk is already at max size"
}
Start-Process cmd -Verb runAs
winrm set winrm/config/service '@{AllowUnencrypted="true"}'
SCRIPT

# 2 = version of configuration file for Vagrant 1.1+ leading up to 2.0.x
Vagrant.configure("2") do |config|

  config.vm.define :adoptopenjdkW2012 do |adoptopenjdkW2012|
    adoptopenjdkW2012.vm.box = "windows10"
    adoptopenjdkW2012.vm.hostname = "adoptopenjdkW2012"
    adoptopenjdkW2012.vm.communicator = "winrm"
    #adoptopenjdkW2012.vm.synced_folder ".", "/vagrant"
    adoptopenjdkW2012.vm.network :private_network, type: "dhcp"
    adoptopenjdkW2012.vm.provision "shell", inline: $script, privileged: false
    #adoptopenjdkW2012.disksize.size = '100GB'
  end
  config.vm.provider "virtualbox" do |v|
    v.gui = true
    v.memory = 2560
    v.customize ["modifyvm", :id, "--cpuexecutioncap", "50"]
  end
  config.vm.boot_timeout = 600
  config.winrm.username = "IEUser"
  config.winrm.password = "Passw0rd!"
end

The important bits are the 3 config lines at the bottom.

Whilst I was waiting for that, I searched a bit more, and it was suggested that a Firewall on the VM can cause timeouts, and the VM does have firewalls up, according to Windows Defender. I'll experiment with turning those off to see if booting works without the incredibly long timeout. If that is the case, I'll have to repackage the Vagrant box like I did with the (now not used) OpenSUSE 12.3 box

Willsparker commented 3 years ago

I've made a branch that has added the Vagrantfile, along with some changes to the various scripts in VPC - these are the changes I can see initially see, but I imagine that there's going to be others.

I will need to add the Vagrant box to the Machine first, as the Vagrantfile assumes that (Though, maybe this could be added to VagrantPlaybookCheck.sh` ... hmm ...).

I'll run VPC once I've put that in :-)

Willsparker commented 3 years ago

https://ci.adoptopenjdk.net/job/VagrantPlaybookCheck/OS=Windows10,label=vagrant/1016/ The first instance of the vagrant machine booting, properly. having added the Vagrant box properly. I had to do this as each user has their own copy of the vagrant box, and it was more efficient to put it in VPC then it is to manually set them up. (also helps if anyone wanted to run Windows 10 on VPC locally)

Willsparker commented 3 years ago

I've come to the conclusion with multiple VPC runs, that if we want Windows 10, we shouldn't be using the MCRSFT one, as it's just too slow to boot and connect. Not sure why that is, honestly, but I have found a few others at https://app.vagrantup.com/boxes/search?order=desc&page=1&provider=&q=windows&sort=updated&utf8=%E2%9C%93 , so I'll test those.

Willsparker commented 3 years ago

https://ci.adoptopenjdk.net/job/VagrantPlaybookCheck/1041/OS=Windows10,label=vagrant/console It works, using https://app.vagrantup.com/StefanScherer/boxes/windows_10 ! I'll let the playbook run through to determine if there's any issues.

Willsparker commented 3 years ago

I have managed to get the WinPB running up to MSVS_2019, without any issues, but it's painfully slow on my laptop (possibly due to my having to reduce the amount of memory it uses). I'll have to run a few VPC runs to double check it works okay.

Willsparker commented 3 years ago

Okay - I couldn't get a single working VPC run, without it timing out, so I'll search for a different machine.

I've found https://app.vagrantup.com/peru/boxes/windows-10-enterprise-x64-eval however, on booting, I get the following error message:

#<Vagrant::Errors::VBoxManageError: There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["modifyvm", "f32d833e-959f-4491-b172-dc254b2c83f3", "--clipboard-mode", "bidirectional"]
...
VBoxManage: error: Unknown option: --clipboard-mode

This is due to my laptop having VirtualBox v6.0 on it, not 6.1. I'd rather not bump the requirement up to 6.1 for VPC stuff, if I can avoid it (considering we know all our other VFs work with 6.0), but it is an option.

Willsparker commented 3 years ago

Okay, I found an alternative VF, that is more lightweight (https://app.vagrantup.com/xnohat/boxes/windows10lite). I've managed to make it boot fairly often. It appears around 1 out of 4 runs, the following error occurs:

==> adoptopenjdkW2012: Waiting for machine to reboot...
An error occurred executing a remote WinRM command.

Shell: Powershell
Command: # Function to check whether machine is currently shutting down
function ShuttingDown {
    [string]$sourceCode = @"
using System;
using System.Runtime.InteropServices;

namespace Vagrant {
    public static class RemoteManager {
        private const int SM_SHUTTINGDOWN = 0x2000;

... (more code)

Message: [WSMAN ERROR CODE: 995]: <f:WSManFault Code='995' Machine='127.0.0.1' xmlns:f='http://schemas.microsoft.com/wbem/wsman/1/wsmanfault'><f:Message>The I/O operation has been aborted because of either a thread exit or an application request. </f:Message></f:WSManFault>
==> adoptopenjdkW2012: Forcing shutdown of VM...

I have also been able to get the playbook to start running without much fiddling (though, I don't have a full run through yet, it looks as if this box is faster than the last).

Willsparker commented 3 years ago

An issue I've noticed with this box is that the port it wants to connect to is hardcoded - Ref: https://ci.adoptopenjdk.net/job/VagrantPlaybookCheck/OS=Windows10,label=vagrant/1102/console

07:37:11 ==> adoptopenjdkWin10: Setting the name of the VM: ansible_adoptopenjdkWin10_1615361756922_34014
07:37:12 Vagrant cannot forward the specified ports on this VM, since they
07:37:12 would collide with some other application that is already listening
07:37:12 on these ports. The forwarded port to 33899 is already in use
07:37:12 on the host machine.
07:37:12 
07:37:12 To fix this, modify your current project's Vagrantfile to use another
07:37:12 port. Example, where '1234' would be replaced by a unique host port:
07:37:12 
07:37:12   config.vm.network :forwarded_port, guest: 3389, host: 1234
07:37:12 
07:37:12 Sometimes, Vagrant will attempt to auto-correct this for you. In this
07:37:12 case, Vagrant was unable to. This is usually because the guest machine
07:37:12 is in a state which doesn't allow modifying port forwarding. You could
07:37:12 try 'vagrant reload' (equivalent of running a halt followed by an up)
07:37:12 so vagrant can attempt to auto-correct this upon booting. Be warned
07:37:12 that any unsaved work might be lost.

I believe this is due to the VM being run on a different User on the machine - once that one is closed, it works fine. Not sure of the issue, or a workaround - but I'll look around some other boxes.

Willsparker commented 3 years ago

The overarching issue I've found with Windows 10 boxes, is that they all seem to be very large and are difficult for Vagrant to boot, and if they're not, they are configured in a quirky way - such as the comments above. For now, I'm going to leave this while I figure out an alternative to a Vagrant Box (i.e. build my own box ... )

sxa commented 3 years ago

Based on Will's experience this may be a difficult task, but adding good first issue in case someone wants to take it on. It should be possible for anyone with a machin able to run vagrant to at least take a look at it.

sxa commented 2 years ago

We probably want this to be Windows 11 now