dockur / windows

Windows inside a Docker container.
MIT License
29k stars 1.98k forks source link

Windows could not apply unattended settting during pass [OfflineServicing]. - exception during installation #299

Closed vetl12 closed 5 months ago

vetl12 commented 8 months ago

Hello I'm trying to install win11 docker image however getting the above mentioned error. I tried to disable few things but still it gives me the same error. Could you please help out ? This happens with win 10 image and the other ones as well.

docker run -d --name='WindowsinDocker' --net='bridge' -e TZ="America/New_York" -e HOST_OS="Unraid" -e HOST_HOSTNAME="MATRIX" -e HOST_CONTAINERNAME="WindowsinDocker" -e 'VERSION'='win11' -e 'CPU_CORES'='2' -e 'RAM_SIZE'='12G' -e 'DISK_SIZE'='64G' -e 'DHCP'='N' -e 'MANUAL'='Y' -e 'TPM'='N' -e 'HV'='N' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8006]' -l net.unraid.docker.icon='https://github.com/dockur/windows/raw/master/.github/logo.png' -p '8007:8006/tcp' -p '3389:3389/tcp' -v '/mnt/cache/appdata/WindowsinDocker/':'/storage':'rw' --device='/dev/kvm' --cap-add=NET_ADMIN --stop-timeout=120 --device-cgroup-rule='c : rwm' 'dockurr/windows'

BdsDxe: failed to load Boot0002 "UEFI QEMU QEMU HARDDISK " from PciRoot(0x0)/Pci(0xA,0x0)/Scsi(0x0,0x0): Not Found BdsDxe: loading Boot0001 "UEFI QEMU QEMU CD-ROM " from PciRoot(0x0)/Pci(0x5,0x0)/Scsi(0x0,0x0) BdsDxe: starting Boot0001 "UEFI QEMU QEMU CD-ROM " from PciRoot(0x0)/Pci(0x5,0x0)/Scsi(0x0,0x0)

image

kroese commented 8 months ago

This error is about an entry in the XML for automatic installation. But as you set MANUAL=Y it should not even use that XML. So it seems that the flags you set are not effectuated..

Can you totally remove the folder /mnt/cache/appdata/WindowsinDocker so that it downloads and processes the ISO again in manual mode?

vetl12 commented 8 months ago

Ha it did pick u the flag now ( before i deleted only .iso file not the whole folder). however not its throws this error :/ nothing new in the docker logs image

kroese commented 7 months ago

From the message it looks like the downloaded ISO file is corrupted somehow? Never seen it before!

vetl12 commented 7 months ago

From the message it looks like the downloaded ISO file is corrupted somehow? Never seen it before!

sorry for the late reply. Yeah, message is weird, and it happened few times. Does it matter the file system that docker is on ? i.e. my docker is running in Unraid server on disk partition in BTRFS. not sure if internally in container would be different file system.

kroese commented 7 months ago

No, it should not matter. Did you manage to solve it already?

vetl12 commented 7 months ago

No, it should not matter. Did you manage to solve it already?

nope =( tried deleting everything and re-running, with same results.

kroese commented 7 months ago

Maybe the formatting of the parameters causes them to be ignored, instead of -e 'MANUAL'='Y' try -e MANUAL=Y for example.

But there is an unRaid Community App called WindowsInDocker, which works fine. So maybe just use this app with the default settings instead of trying to create your own run command.

passtpro commented 6 months ago

Good afternoon. I am getting the same error shown in the first post of this thread. I am using the UnRaid Community App and have not made any changes to the default settings. I have tried both the Windows 11 and Windows 10 images. On initial run I get the original error, then after a reboot I get a "Recovery" screen (see attachment). image

kroese commented 6 months ago

@passtpro When you add a container variable with the key MANUAL and the value Y does it allow you to complete the installation manually? Because that may help to diagnose why the automatic one fails.

passtpro commented 6 months ago

@passtpro When you add a container variable with the key MANUAL and the value Y does it allow you to complete the installation manually? Because that may help to diagnose why the automatic one fails.

Sorry for the late response. I am trying this today and will report the results.

passtpro commented 6 months ago

I did the manual install, after defining the storage driver to use I get the following error message: "Windows could not set the offline locale information. error code 0x807C0000". Will completely start over and wipe out everything from Unraid and try again.

passtpro commented 6 months ago

Completely wiped it all out, followed manual install selecting the correct storage driver, and now I get an error about a missing, required driver.

Sociopathssive commented 6 months ago

Having the same issue in Unraid with the WindowsinDocker from the CA store. Tried with default settings, WIN10, Server2022 and Server2019. Starting fresh with including wiping appdata and image each with each version. Stating the "... unattend during pass.." and BSOD recovery.

kroese commented 6 months ago

I noticed that everyone who has this problem are all users of unRaid / WindowsInDocker CA? So my guess is that it may have something to do with the filesystem for example. Maybe it only happens on ZFS disks for storage, and thats why it does not happen on other Linux distros?

So my suggestion would be to try if using a another filesystem makes a difference? (On unRaid you can switch between FUSE and native by using /mnt/cache/ instead of /mnt/user for example. Or if you have a EXT4 partition to use that one).

Other suggestion would be to play with the disk parameters via container variables, for example:

Set DISK_FMT to qcow2 instead of raw. Or set ALLOCATE to Y instead of N. Or set DISK_IO to threads and DISK_CACHE to writeback.

The difficulty for me is that when everybody just say "I am having the same issue", while nobody is listing their machine specs (like filesystem, cpu model, etc) it is impossible to detect a pattern what you all share in common. If everyone who reports it is using BTRFS filesystem or is using the same CPU, it would be much easier to figure out it was a problem with BTRFS for example.

passtpro commented 6 months ago

You're right. And I should know better. I'm dropping a screenshot of my system information. Currently my cache disk where I am installing is a PNY SSD using the BTRFS file system. Screenshot_20240427-071112

Sociopathssive commented 6 months ago

I noticed that everyone who has this problem are all users of unRaid / WindowsInDocker CA? So my guess is that it may have something to do with the filesystem for example. Maybe it only happens on ZFS disks for storage, and thats why it does not happen on other Linux distros?

So my suggestion would be to try if using a another filesystem makes a difference? (On unRaid you can switch between FUSE and native by using /mnt/cache/ instead of /mnt/user for example. Or if you have a EXT4 partition to use that one).

Other suggestion would be to play with the disk parameters via container variables, for example:

Set DISK_FMT to qcow2 instead of raw. Or set ALLOCATE to Y instead of N. Or set DISK_IO to threads and DISK_CACHE to writeback.

The difficulty for me is that when everybody just say "I am having the same issue", while nobody is listing their machine specs (like filesystem, cpu model, etc) it is impossible to detect a pattern what you all share in common. If everyone who reports it is using BTRFS filesystem or is using the same CPU, it would be much easier to figure out it was a problem with BTRFS for example.

Specs:

Model: R720XD
M/B: Dell Inc. 0C4Y3R Version A02
BIOS: Dell Inc. Version 2.9.0 Dated 12/06/2019
CPU: Intel® Xeon® CPU E5-2667 v2 @ 3.30GHz
HVM: Enabled
IOMMU: Enabled
Cache: Not Specified: 256 KiB, Not Specified: 2 MiB, Not Specified: 25 MiB, Not Specified: 256 KiB, Not Specified: 2 MiB, Not Specified: 25 MiB
Memory: 192 GiB DDR3 Multi-bit ECC (max. installable capacity 1536 GiB)
Network: bond0: adaptive load balancing, mtu 1500
Kernel: Linux 6.1.79-Unraid x86_64

Appdata FS : BTRFS on PCIe Gen3 NVMe

Seems already set to cache, by default no?: image

kroese commented 6 months ago

Does one of you happen to have any partitions that are not BTRFS in your unRaid? To see if it makes a difference when the storage path is on a different filesystem?

Or otherwise try changing it from /mnt/cache to /mnt/user so that even though its still on BTRFS it goes through the FUSE layer.

Sociopathssive commented 6 months ago

Does one of you happen to have any partitions that are not BTRFS in your unRaid? To see if it makes a difference when the storage path is on a different filesystem?

Or otherwise try changing it from /mnt/cache to /mnt/user so that even though its still on BTRFS it goes through the FUSE layer.

Testing some different FS now: So far it boots in ZFS off a USB SDD, mounted in unassigned devices( quick and dirty). Looking to test more when I have time.

Container: image

ZFS "pool": image

Server2022: image

kroese commented 6 months ago

Very interesting.. So it seems related to BTRFS. The only thing I do differently for BTRFS is that since recently I disable CoW (Copy-on-Write) for the disk image file (data.img) as that should improve performance. I read everywhere that virtual machine images are not a good match with BTRFS, and that by disabling CoW it should mitigate that problem.

So the scripts calls:

chattr +C /storage/data.img

to set the C flag to disable CoW.

It would be very interesting to know if this has something to do with the problem.

So if someone has time to test: please shutdown the container and delete the data.img file and replace it by an empty file of around the same size (so that it does not have the C attribute set). And then start the container again, and see if you still get the OfflineServicing error.

GiuffreLab commented 6 months ago

Building containers based on the image tag 3.08 seems to cause this issue now on a system that I previously did not see this on.

Rolling images back to 3.07 resolved the error message. Not sure what changed, but 3.08 has some problems with this.

kroese commented 6 months ago

@GiuffreLab Can you provide some more info, like the Windows version you are installing, on which filesystem, etc?

passtpro commented 6 months ago

@kroese Been a while since I responded, but wanted to follow up. I was able to get everything installed once I changed my cache disk to ZFS. Looks like the standard install does not play well with BTRFS.

GiuffreLab commented 6 months ago

@GiuffreLab Can you provide some more info, like the Windows version you are installing, on which filesystem, etc?

Docker node is a bare metal Debian server latest release, file systems ext4 and was tested against win11 and 2022 Windows Versions.

kroese commented 6 months ago

@GiuffreLab Thanks! The error seems to indicate something is wrong with the answer file (XML), but since I cannot reproduce it with the same XML, most likely its caused by something else.

It would be very nice if you could try some things out! First download the XML from v3.07:

https://raw.githubusercontent.com/dockur/windows/v3.07/assets/win11x64.xml

and use that file while running the container v3.08:

  volumes:
    -  /home/user/v307.xml:/custom.xml

This way, we can rule out if it has anything to do with the changes in the XML file or not.

If it gives the same error even with the old XML file, please press "Shift+F10" to open a command prompt during the setup. And type in "notepad X:\Windows\panther\setupact.log" to view the contents of the setup logfile and inspect if it contains any clues at the bottom.

kroese commented 6 months ago

@GiuffreLab I really would like to solve this issue, and checked the differences between v3.07 and v3.08 multiple times, but I cannot see any obvious explanation.

Can you please provide the compose file and info about the ISO you used?

GiuffreLab commented 5 months ago

@kroese Sorry, been tied up all day. Here is two examples of how I was deploying. These work when I change the image to dockurr/windows:3.07. In this state last I tried a few days ago (day of update), they failed. This is exactly how they were deployed, aside from credentials and the full volume path changing.

Apologies for the over commenting, I keep archives of working docker compose files, and it helps me not have to remember why I did things the way they are.

Single Node Deployment

version: "3.8"
services:
  windows:
    image: dockurr/windows
    container_name: windows
    environment:
      VERSION: win11 # win11e, win11, win10e, win10, win8, win8e, win7, vista, winxp, 2022, 2019, 2016, 2012, 2008
      DISK_SIZE: 128G # 64G is the minimum - This is how much space the windows system drive can use as needed on the host
      CPU_CORES: 4 # 2 cores is the minimum
      RAM_SIZE: 8G # 4G is the minimum
      USERNAME: <CHANGEME>
      PASSWORD: <CHANGEME>
    volumes:
      - ./windows-test/shared:/shared # shared folder between host and container
      - ./windows-test/var/win:/storage # windows system drive
    devices:
      - /dev/kvm # this is for the ability to see the windows build screen via VNC web interface on port 8006
    cap_add:
      - NET_ADMIN
    ports:
      - 8006:8006 # this is the port for VNC web interface 
      - 3389:3389/tcp # this is the port for RDP
      - 3389:3389/udp # this is the port for RDP
    stop_grace_period: 2m
    restart: unless-stopped

Multi-Node Deployment

version: "3.8"
services:
  windows:
    image: dockurr/windows:3.07
    container_name: windows
    environment:
      VERSION: win11 # win11e, win11, win10e, win10, win8, win8e, win7, vista, winxp, 2022, 2019, 2016, 2012, 2008
      DISK_SIZE: 128G # 64G is the minimum - This is how much space the windows system drive can use as needed on the host
      CPU_CORES: 4 # 2 cores is the minimum
      RAM_SIZE: 8G # 4G is the minimum
      USERNAME: <CHANGEME>
      PASSWORD: <CHANGEME>
    volumes:
      - ./windows-test/shared:/shared # shared folder between host and container
      - ./windows-test/var/win:/storage # windows system drive
    devices:
      - /dev/kvm # this is for the ability to see the windows build screen via VNC web interface on port 8006
      - /dev/vhost-net
    device_cgroup_rules:
      - c *:* rmw
    cap_add:
      - NET_ADMIN
    networks:
      vlan:
        ipv4_address: 192.168.10.56 # this is the IP address of the container on the network - must be in the range of the subnet below
    ports:
      - 8006:8006 # this is the port for VNC web interface
      - 3389:3389/tcp # this is the port for RDP
      - 3389:3389/udp # this is the port for RDP
    stop_grace_period: 2m
    restart: unless-stopped
  windows-svr:
    image: dockurr/windows:3.07
    container_name: windows-svr
    environment:
      VERSION: 2022 # win11e, win11, win10e, win10, win8, win8e, win7, vista, winxp, 2022, 2019, 2016, 2012, 2008
      DISK_SIZE: 128G # 64G is the minimum - This is how much space the windows system drive can use as needed on the host
      CPU_CORES: 4 # 2 cores is the minimum
      RAM_SIZE: 8G # 4G is the minimum
      USERNAME: <CHANGEME>
      PASSWORD: <CHANGEME>
    volumes:
      - ./windows-test/shared:/shared # shared folder between host and container
      - ./windows-test/var/win:/storage # windows system drive
    devices:
      - /dev/kvm # this is for the ability to see the windows build screen via VNC web interface on port 8006
      - /dev/vhost-net
    device_cgroup_rules:
      - c *:* rmw
    cap_add:
      - NET_ADMIN
    networks:
      vlan:
        ipv4_address: 192.168.10.57 # this is the IP address of the container on the network - must be in the range of the subnet below
    ports:
      - 8006:8006 # this is the port for VNC web interface
      - 3389:3389/tcp # this is the port for RDP
      - 3389:3389/udp # this is the port for RDP
    stop_grace_period: 2m
    restart: unless-stopped
networks:
  vlan:
    driver: macvlan
    driver_opts:
      parent: enp1s0 # this is the physical network interface on the host docker is running on
    ipam:
      config:
        - subnet: 192.168.10.0/24
          ip_range: 192.168.10.56/29 # this is the usable range of IP addresses that can be assigned to containers - 192.168.10.56-62
          gateway: 192.168.10.1
kroese commented 5 months ago

@GiuffreLab Okay, this helps a lot!! Because now I see you are using the PASSWORD variable (which very few people do since it was introduced so recently). And I remember making a change in v3.08 so that I put the password now in BASE64 encoding inside the XML, instead of in plaintext like in v3.07.

So I guess this is where it fails now. Do you have any strange characters in your password? Because I tested this code with just alphabetical password, but maybe if it includes certain characters the conversion to base64 fails and Windows cannot read the XML anymore.

GiuffreLab commented 5 months ago

The password is all letters and numbers with a single exclamation point as a "strange" character.

kroese commented 5 months ago

@GiuffreLab Okay, I found the issue! It was related to the length of the password (and I only tested with very short ones like test).

Ironically the whole reason I made this switch to base64 encoding was to prevent issues with certain special characters like <, but that created an issue with certain lengths.

I will build a new version in a few hours (v3.10), please try it out and let me know if it is fixed now.

GiuffreLab commented 5 months ago

@kroese I tested a few builds with the 3.10 update. Looks to have resolved the issues seen with the PASSWORD variable. I have not seen any failures.