Closed sxa closed 1 month ago
Please, assign this task to me. Thank you.
Of the three options listed on the Microsoft website:
jshell
process.OK First phase done ...
docker run -p 5986:5986 -v c:\Users\sxa:c:\sxa mcr.microsoft.com/windows/servercore:ltsc2022
MyPassword
is not what I've used on the live system!):
net user ansible MyPassword /ADD
net localgroup "Administrators" ansible /ADD
net localgroup "Remote Management Users" ansible /ADD
This allows the machine to be accessible via ansible running on a remote machine :-)
(Also, for my own notes, to debug powershell scripts use Set-PSDebug -Trace 2
)
Playbook execution notes:
/Vendor_Files/windows
, otherwise MSVS_2013
needs to be skippedNTP_TIME
needs to be skipped as that has issues that are presumably related to running in a container: FAILED! => {"changed": false, "msg": "Unhandled exception while executing module: Service 'Windows Time (W32Time)' cannot be started due to the following error: Cannot start service W32Time on computer '.'."}
adoptopenjdk
needs to be skipped to allow them to complete successfullyansible can be run on the host to point at the container if you install cygwin which has ansible
as one of its installable options (You probably want to include git
too if it's a clean install on the host system). Noting that if you use localhost
/127.0.0.1
in your hosts file you should specify -e git_sha=12345
or something appropriate otherwise the execution will trip up over https://github.com/adoptium/infrastructure/blob/4aa7788325c224484f99aa1ae000f117e9b081d7/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/logs/tasks/main.yml#L14
Noting that WSL could probably be used too, but that requires a system with virtualization extension instructions to be available which is not the case on all systems.
Latest attempt is with:
--skip-tags adoptopenjdk,reboot,MSVS_2013,MSVS_2017,NTP_TIME
(Note: MSVS_2013 is because I didn't have the installer on the machine, 2017 did not work, could also add Dragonwell
to skip that install which is not required for Temurin.
Playbook changes to make it complete:
ansible.cfg
group_vars/all/adoptopenjdk_variables.yml
win_reboot:
from Common/roles/main.yml Line 60win_reboot:
from MSVS_2013 role line 50win_reboot:
from MSVS_2017 role line 37checksum
parameters MSVS_2022 role line 103 as it's been updatedwin_reboot
from WMF_5.1 role line 29win_reboot
from cygwin role line 45 (Although it's already covered with th reboot
tag)After ansible run is complete, run the commands shown in this article
docker ps
docker stop <image>
docker commit <image> win2022_build_image
After which it can be started again and used
docker commit
didn't work on my image:
Error response from daemon: re-exec error: exit status 1: output: mkdir \\?\C:\Windows\SystemTemp\hcs376450290\Files: Access is denied
This is specific to the new image which has had the playbook run on it and does not occur when attempting to commit a image with only basic changes applied.
EDIT: This seems to be the temporary location where it is storing the entire image before it is committed and the machine ran out of space.
Noting that outside that directory most of the docker data is stored in C:\ProgramData\docker
EDIT 2: The docker commit
command on the second machine which had adequate space used around 95GB of space in C:\windows\SystemTemp
to perform the commit (excluded VS2013 and 2017) and took about 40 minutes at 40-50Mb/sec showing on resource monitor, followed by about 10 minutes of using another 15GB on C: then moving data back to the docker directory at a faster rate (Maybe ~100Mb/sec)
It did, however, hit an error Error response from damon: re-execx error: exit status 1: output: hcsshim::IpmportLayer failed in Win32: Access is denied. (0x5)
(Probably hit a zero disk space condition on C: since DOCKER_TMPDIR
apparently isn't working to relocate that since docker 25)
This is unfortunate. The builds aren't working because it looks like the automatic shortname generation (fsutil behavior set disable8.3 0
) does not appear to be working within the container but is mandatory for the openjdk build process. Directories can have a shortname created manually with fsutil file setshortname "Long name" shortname
but that is not ideal to do for each possible path.
EDIT: Noting that https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/shortNames/tasks/main.yml already has some explicit short name creation.
Manually created a few of the shortnames that the configure step was objecting to and I have a JDK21u build complete in a container, so this seems feasible 👍🏻
Noting that we should look at doing this with the MS build tools installer which is suitable for use by Open Source projects. The jdk21u builds currently use:
10:04:20 * C Compiler: Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)
10:04:20 * C++ Compiler: Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)
Other references (this numbering is more confiusing that I realised - I thought we only had the '2022' vs '19.xx' versioning differences to worry about before today...)
Noting that we should look at doing this with the MS build tools installer which is suitable for use by Open Source projects. The jdk21u builds currently use:
10:04:20 * C Compiler: Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)
10:04:20 * C++ Compiler: Version 19.37.32822 (at /cygdrive/c/progra~1/micros~3/2022/commun~1/vc/tools/msvc/1437~1.328/bin/hostx64/x64/cl.exe)
Other references (this numbering is more confiusing that I realised - I thought we only had the '2022' vs '19.xx' versioning differences to worry about before today...)
Struggling with the GPG role at the moment which is called during the ANT role (I'm getting gnupg
as a requirement which supplies gpg2
instead of gpg
). Also Wix
has to be skipped as I don't have ansible.builtin.runs
available.
Other than that a two-phase dockerfile is looking quite promising. The first sets up WinRM (will only be invoked locally) and installs cygwin with git and ansible, then triggers a reboot to ensure the cygwin path takes effect.
The second runs the playbooks as normal, although for now I've currently it running in multiple layers for performance of testing to allow the caching of each layer to take effect independently:
--skip-tags adoptopenjdk,reboot,ANT,NTP_TIME,Wix,MSVS_2013,MSVS_2017,MSVS_2019,MSVS_2022
-t ANT
-t MSVS_2019
-t MSVS_2022
This is currently using the playbook branch at https://github.com/sxa/infrastructure/tree/sxa_allhosts which makes a few changes to support this execution.
The above approach seemed to work yesterday now that the machine is rebooted after adding cygwin to the PATH and I had a system which was able to successfully build jdk21u using two dockerfiles (First to configure WinRM, the second to run the playbooks using the individual layers from the previous comment. Next steps as follows:
Noting that the image without VS2013 or 2017 is 99GB in size.
Now fixed the path setting so that it only requires one dockerfile so we have something consistent with what we have on Linux now 👍🏻
It still currently requires a username/password for the authentication, but the password can be passed into the dockerfile with --build arg PW=SomeAcceptablePassword
on the docker build
command.
I haven't got it picking up the git_sha properly yet so that is currently hard-coded. Everything else is good enough to be able to run a jdk21u build on, but it's missing the compilers for some earlier versions (Will need those on the host and mapped in via Vendor_Files
, similar to what we do with AWX). Also we'll want the jenkins_user role (Currently skipped via adoptopenjdk
unless we're happy with the processes running as an administrator within the container (Need to check how well user mapping works in these containers)
Otherwise, here is the dockerfile Dockerfile.win2022v2.txt which uses the playbook changes from https://github.com/sxa/infrastructure/tree/windows_docker_fixes
VS2013 install appears to complete OK (Based on the logs in C:\Windows\SystemTemp
- more detailed logs are in C:\Temp
) but the playbook doesn't terminate that role so it never continues.
Sizes: Version | Path | Total file size on file system |
---|---|---|
VS2022 | C:\Program Files\Microsoft Visual Studio\2022 |
19.7G |
VS2019 | C:\Program Files\Microsoft Visual Studio\2019 |
12.5G |
VS2017? | C:\Program Files\Microsoft Visual Studio 14.0 |
2.3G |
n/a | C:\Program Files (x86)\Windows Kits |
14G (+7GB with VS2017) |
n/a | C:\Program Files (x86)\Microsoft SDKs |
5.8G |
NOTE: The playbooks set up with the dockerfile excluding all the visual studio installations produces a docker image which is 15.4G in size
NOTE 2: If the machine runs out of disk space on C:
during a commit phase, there will be hcs*
directories left under C:\Windows\SystemTemp
which should be removed manually.
Steps to set up:
git
and ansible
support addedmklink /J C:\ProgramData\docker F:\
(It's not immediately obvious how to set the data dir to a different location, so this works in the meantime)fsutil 8dot3name set 0
(Otherwise shortnames can't be set within the containers which cygwin will need to run our automation)Invoke-WebRequest -UseBasicParsing "https://raw.githubusercontent.com/microsoft/Windows-Containers/Main/helpful_tools/Install-DockerCE/install-docker-ce.ps1" -o install-docker-ce.ps1
then .\install-docker-ce.ps1
or with the manual download steps at https://docs.docker.com/engine/install/binaries/#install-server-and-client-binaries-on-windowsdocker run mcr.microsoft.com/windows/servercore:ltsc2022 cmd /c dir /x
to verify that there is a visible shortname for the Program Files
and Program Files (x86)
directories.docker build --build-arg PW=Some-Pa55wd -t win2022_build_image -f Dockerfile.win2022 . 2>&1 | \cygwin64\bin\tee ansible.log
(The PW parameter doesn't matter as long as it's valid for a windows user as it only exists during the ansible run)From there you can run this to start the container:
mkdir %HOMEPATH%\workspace
docker run it -v %HOMEPATH%\workspace:C:\workspace win2022_build_image
Then go through the normal build process:
cd \workspace
git clone https://github.com/adoptium/temurin-build
cd temurin-build/build-farm
set CONFIGURE_ARGS=--with-toolchain-version=2022
bash ./make-adopt-build-farm.sh jdk21u
Based on https://github.com/adoptium/temurin-build/issues/2922#issuecomment-2269480488 we may be able to switch to using Visual Studio 2022 for everything which would significantly reduce the windows installation requirements. The dockerfile is currently set up to only install VS2022 and not the other versions.
Next bullet on the list is to: Integrate this into the build pipelines
Initial attempts using a jenkins workspace directory with a drive on F:
failed because the jenkins docker failed to map it into F:
in the container as there was only a C:
drive. Switched the workspace directory to C:\jenkins-workspace
and we hit path limits:
10:03:48 configure: error: Your base path is too long. It is 112 characters long, but only 100 is supported
Now moved to using C:\ws
for the directory and it seems to be progressing well:
Machine: dockerhost-azure-win2022-x64-1 (temporary, called sxa-win2022-3
in the Azure console)
Build job: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-docker/
Noting that I have had errors like this and while I have not identified the exact cause, clearing out the build-scripts
directory in the workspace resolves it:
10:24:33 Checked out HEAD commit SHA:
[Pipeline] sh
10:24:34 sh: c:/jw/workspace/build-scripts/jobs/jdk21u/jdk21u-windows-x64-docker@tmp/durable-34ace7f2/script.sh.copy: No such file or directory`
The build (both jdk8u and jdk21u) then failed later on with another path length issue. I have therefore shortened the name of the job to windbld
(Windows Docker Build) and the build has run through to completion. This will need further investigation but it's a good position at which to end the week :-) I've had to make some changes in the build repository to make this work (most specifically using git config --global safe.directory /cygdrive/c/jw/workspace/build-scripts/jobs/jdk21u/windbld
in openjdk_build_pipeline.groovy to avoid errors such as the one in https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/35/console:
10:14:10 + git clean -fdx
10:14:10 fatal: detected dubious ownership in repository at '/cygdrive/c/jw/workspace/build-scripts/jobs/jdk21u/jdk21u-windows-x64-docker'
10:14:10 To add an exception for this directory, call:
10:14:10
10:14:10 git config --global --add safe.directory /cygdrive/c/jw/workspace/build-scripts/jobs/jdk21u/jdk21u-windows-x64-docker
Successful builds in jenkins with windbld
job name:
script.sh.copy
error todayJob https://ci.adoptium.net/job/win2022_docker_image_updater/label=dockerhost-azure-win2022-x64-1/ is being prototyped to create the docker image. It is a stripped down copy of the rhel7/s390x one and will save to win2022_notrhel_image
on the host for now, and as per earlier comments it does not include the infrastructure SHA.
With the initial feasibility done, I'm going to leave this closed and create follow-on items for the subsequent tasks and the outstanding items on the list:
windbld
instead of jdk21u-windows-x64-temurin
in order to avoid exceeding path length limits (See comment linked a few bullets up for details) Testing at https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-temurin/138/consolescript.sh.copy: No such file or directory
- seems to happen sometimes if the C:\jw\workspace\build-scripts
directory isn't cleaned. Sometimes I've found it needs to be cleaned twice. Maybe it's coincidence. Needs to be understood... detected dubious ownership
- should be resolved by the .gitconfig
file and ensuring that the HOME
environment variable in the agent definition is set to C:\jw
where is where the gitconfig is, but I'm suspicious that the ownership of that file may make a difference. We should also understand the security implications of using a persistent git config file like this that the pipelines could potentially modify. I have some pipeline changes, mostly to cover one of the dubious ownership
scenarios where the .gitconfig
seemingly isn't taking effect before git clean -fdx
that need to be integrated somehow ... C:\jw
C:\workspace
as the HOME directory (can be defined in the startjenkins script on the machine) and the .gitconfig
in there) in an ansible playbook similar to the dockerhost.yml .gitconfig
in the shared location that could be modified by the build process. Perhaps it should use one that's hard coded with the option we need at image creation time?Jenkins job refs:
Note: The HOME environment variable set when the jenkins agent is started is significant, as it affects where git picks up the .gitconfig
from during the pipeline checkout on the host. On the current machine I'm using for testing this is set in the startjenkins.sh
script before the agent is started.
Above PR should fix the issue with long file names - I'm doing some extra tests to verify with my current job and have also initiated https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-temurin/138/console to test with the full job name. It should be good with the PR in place as it's using the same logic for overriding the default workspace location as we use in the non-docker situation on Windows.
Note that as part of this I have switched from using the C:\jw
directory for the top level jenkins home on the docker host machine to C:\workspace
for consistency with the non-docker case.
First build using the main pipelines on the dockerhost machine: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/jdk21u-windows-x64-temurin/151/
"NODE_LABEL": "dockerhost-azure-win2022-x64-1",
"DOCKER_IMAGE": "notrhel_build_image",
USER_REMOTE_CONFIGS:
{
"branch": "docker_windows_shortpath",
"remotes": {
"url": "https://github.com/sxa/ci-jenkins-pipelines.git"
}
}
DEFAULTS_JSON:
"pipeline_branch": "docker_windows_shortpath",
"pipeline_url": "https://github.com/sxa/ci-jenkins-pipelines.git",`
It's been quite a lot of work but the sign_Verification job now has a working run after a refactor of the code that does the signing and assembly within the pipelines. Ref: https://github.com/adoptium/infrastructure/issues/3709#issuecomment-2373386390 A bit of cleaning up, and then verifying that it can create reproducible builds, will mean this can go in as a PR.
--create-sbom
wasn't working as ant
is not in the PATH on the machine. For now I've added that to the path of the environment variables in the jenkins machine definition, but that's probably something we want to cover in the container image setup.
I need to request a new machine:
Please explain what this machine is needed for: Running builds in an isolated way where we can achieve SLSA build level 3 compliance on Windows along with the other primary platforms. Ideally we'll be able to create windows-on-windows container images which we share and then download and run the builds in.
As background info:
So the tasks required would be:
-v
on linux) which are read+write in the container-v
and use that to build Temurin in the container on the mapped volume so that the output is visible on the host system.Once this level of analysis and expertise is gained it will likely make windows installer testing, or any other such activities simpler and give us more options moving forward.
Related for historic reference: