Open sxa opened 2 months ago
Noting that ls -l
on the host shows that the user of the files under build-scripts\job\jdk21u\windbld@tmp\durable*` (including script.sh.copy
) as the user that jenkins is running under on the host. When the same ls
is run in a container it shows as Unknown+User:Unknown+Group
. Files created within the container (such as the workspace
directory under windbld
) shows as ContainerUser:ContainerUser
when viewed from inside the container. Confusingly, those also show as the same user that jenkins is running at when looked at on the host.
The attempt to use .gitconfig
in C:\jw
isn't working. If Iissue a git config --global -l
from within the workflow I get a failure:
12:22:29 + git config --global -l
12:22:29 fatal: unable to read config file '/cygdrive/c/jw/.gitconfig': No such file or directory
If I issue that immediately after adding a safe.directory parameter with git config then it shows the correct value so it's using a git configuration from elsewhere at that point.
If I move it out of the way then it fails earlier in the pipeline:
12:47:36 [CHECKOUT] Checking out User Pipelines https://github.com/sxa/ci-jenkins-pipelines.git : windows_docker_support
[Pipeline] checkout
12:47:36 The recommended git tool is: git
12:47:36 No credentials specified
12:47:36 Warning: JENKINS-30600: special launcher org.jenkinsci.plugins.docker.workflow.WithContainerStep$Decorator$1@1f132a55; decorates hudson.plugins.cygpath.CygpathLauncherDecorator$1@cef8bfd will be ignored (a typical symptom is the Git executable not being run inside a designated container)
12:47:36 Cloning the remote Git repository
12:47:36 ERROR: Error cloning remote repo 'origin'
12:47:36 hudson.plugins.git.GitException: Command "git fetch --tags --force --progress -- https://github.com/sxa/ci-jenkins-pipelines.git +refs/heads/*:refs/remotes/origin/*" returned status code 128:
So for now it seems that this gitconfig file, and whatever it's using when I explicitly add in the safe.directory setting, are both required, so I'll leave both in place. Note that before I set the safe.directory
options I have configured that the HOME variable is set to the jw
directory via a sh -c set
command.
Ref: https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk21u/job/windbld/
Note that after a successful (ish) jdk8u build (169) I had two consecutive failures trying to kick off jdk21u (170,171), but then the third one (172) passed that step without requiring the workspace to be moved out of the way.
Based on some investigations in https://github.com/adoptium/infrastructure/issues/3723 I tried changing the ownership of the @tmp
directory so that it was definitely owned by ContainerUser
but that didn't make a difference. The first time running after I explicitly removed the @tmp
directory the job started to run through successfully. We will see if that is repeatable.
Answer: No.
After jdk21u completed (subject to https://github.com/adoptium/infrastructure/issues/3709) in windbld run 242, jobs 244 and 245 failed, but the following 246 passed - all were run after clearing out the @tmp
and cyclonedx-lib
directories.
247 run afterwards then went straight through without problems (Again after removing those two directories).
So we still have inconsistencies. I'm thinking it would be nice to get a simple pipeline which starts a container and is able to demonstrate this, since out multi-thousand line monolith isn't ideal for problem reproduction/raising upstream,
I've just tested this using a standalone jenkins pipeline:
pipeline {
agent any
stages {
stage('Test Docker on Windows') {
agent { docker { image 'notrhel_build_image' } }
steps {
println('Attempting to run commands in docker container')
sh(script: 'cmd /c echo Hello')
sh(script: 'hostname')
sh(script: 'ls -l c:/')
sh(script: 'ls -l c:/workspace')
sh(script: 'ls -l c:/workspace/workspace')
sh(script: 'ls -l c:/workspace/workspace/windtest')
}
}
}
}
Running a sequence of jobs I had the error after a varying amount of failures: 5,1,6,0,0,1,1,0,0,0,0 (The 6 passed all of them!!)
Running the same jobs with bat()
instead of sh()
appears to pass reliably. Intriguing ...
Noting also that having git bash in the path first (before the Cygwin one) makes no difference - the error still occurs.
Memo to self: We have some functions executed in Windows pipelines that are run on either Windows or UNIX systems depending on the pipeline - specifically the writeMetadata function https://github.com/adoptium/ci-jenkins-pipelines/blob/4bfdbb67722dd7e96b256511ac6586e749650524/pipelines/build/common/openjdk_build_pipeline.groovy#L1280
ENABLE_SIGNER=false
- temperamental with sh (windbld#484) (eyecatcher Batch scripts can only be run on Windows nodes
in listArchives at same place in windbld#482 when allowBat=true in listArchives (After OUT OF DOCKER NODE
when it switches back to jenkins-worker
I'm going to leave this with sh
in these cases for now, and switch attention to another PR.
Memo to self: We have some functions executed in Windows pipelines that are run on either Windows or UNIX systems depending on the pipeline
Now sorted that case - using an isUnix()
test which actually tests for "Not Windows" as the machine it's running on which is much better than my previous check which tested whether we were doing a windows build in a docker pipeline. (I hadn't spotted that built-in function previously)
This is seen periodically in the
windbld
jobs - maybe just after error conditions on previous runs but that is not certain. It is often resolved by removing theC:\jw\workspace\build-scripts
directory, although I have seen situations where I've done that, run another build which has failed, cleared it again and it works, so it's unclear if we're experience some delay somewhere in the clearup having the desired effect. The root cause of this error is currently unknown.Noting that to run a test multiple times without taking up a full build cycle you can set
"JAVA_TO_BUILD": "jdkXXu",
in the job which will start the job but abort with an error about the java version after the point at which this failure occurs.