Closed sxa closed 3 years ago
-1
is back as I've come to it, I'll look at -2
:-)
Rather interestingly, neither of the machines have the Jenkins Agent installed as a service. They appeared to be running the agent in a cygwin terminal window. I'll install it on both
I can't install them on both, due to the lack of IcedTea-Web. I can install it, but it seems that the machines have a stripped down version of the playbook running on them, and I'm unsure of the reason for that. (asked about it here: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1610356948467700 ) In the meanwhile, I'll get the Jenkins agent running in a Cygwin terminal again, so they're at least usable.
@Haroon-Khel said he's installing the missing packages on the machines (ref: https://adoptopenjdk.slack.com/archives/C53GHCXL4/p1610356948467700 ).
Missing packages have been installed on both alibaba machines, except for OpenSSL packages. Both experienced the error
TASK [Install OpenSSL-1.1.1i 64-bit (VS2013)] ******************************************************************************************************************************************
task path: /Users/hkhel/AdoptOpenJDK/openjdk-infrastructure/ansible/playbooks/AdoptOpenJDK_Windows_Playbook/roles/OpenSSL/tasks/main.yml:73
fatal: [8.208.87.18]: FAILED! => {"changed": true, "cmd": "set PATH=C:\\Strawberry\\perl\\bin;C:\\openjdk\\nasm-2.13.03;%PATH% && .\\vcvarsall.bat AMD64 && cd C:\\temp\\OpenSSL-1.1.1i && perl C:\\temp\\OpenSSL-1.1.1i\\Configure VC-WIN64A --prefix=C:\\openjdk\\OpenSSL-1.1.1i-x86_64-VS2013 && nmake install > C:\\temp\\openssl64-VS2013.log &&
nmake -f makefile clean", "delta": "0:00:03.448210", "end": "2021-01-11 12:34:31.186987", "msg": "non-zero return code", "rc": 1, "start": "2021-01-11 12:34:27.738777", "stderr": "'nmake' is not recognized as an internal or external command,\r\noperable program or batch file.\r\n", "stderr_lines": ["'nmake' is not recognized as an internal or external command,", "operable program or batch file."], "stdout": "The specified configuration type is missing. The tools for the\r\nconfiguration might not be installed.\r\nConfiguring OpenSSL version 1.1.1i (0x1010109fL) for VC-WIN64A\r\nUsing os-specific seed configuration\r\nCreating configdata.pm\r\n
Looking into it
We're having some issuejs on these machines after (a) running the rest of the playbooks and (b) Switching the jenkins agent to run as the jenkins user. While most of them have now been resolved I'm still getting the following issue (even after a reboot) on -1
which I haven't yet been able to fully diagnose ... Still working on it but any crazy ideas welcome :-)
17:00:23 Running gradle with /cygdrive/c/openjdk/jdk-11 at /cygdrive/c/workspace/openjdk-build/workspace/.gradle
17:00:23 Exception in thread "main" java.io.FileNotFoundException: \cygdrive\c\workspace\openjdk-build\workspace\.gradle\wrapper\dists\gradle-6.5-bin\6nifqtx7604sqp1q6g8wikw7p\gradle-6.5-bin.zip.lck (Access is denied)
OpenSSL 64 bit VS2013 also isnt installed on either -1 or -2 due to vcvarsall.bat
not being available in in the VS2013 folders. Reinstalling VS2013 didnt seem to solve this
@sxa Have you tried running it with a different JDK (or reinstalled JDK11) ? Presuming you've already looked at all the permissions of the folders and everything.
Latest failure https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-dragonwell/38/console Still the same error, but running the same build command on a cygwin shell, as the jenkins user, on build-alibaba-win2012r2-x64-1 in an rdp session doesnt seem to hit this error
I changed the variable CYGWIN_WORKSPACE to C:\Users\Jenkins\workspace (it was C:\Jenkins\workspace before). This may have done the trick https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-hotspot/884/console (the hotspot builds were failing for the same reason too)
https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-dragonwell/40/console A dragonwell build on alibaba -1 passed, but failed at the installer stage. I think the variable change helped to circumvent the gradle error
Running the dragonwell jdk8 job on alibaba-1, https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/46/console, jenkins seems to have a problem with clearing the C:\Users\Jenkins\workspace workspace
Re ran the jdk11 dragonwell job on alibaba-1, same error https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-dragonwell/41/console. Oddly this wasnt a problem yesterday when I ran both jdk11 hotspot and dragonwell jobs on the same machine one after the other
I deleted the C:\Users\Jenkins\workspace directory. I re ran the jdk11 hotspot and dragonwell and jdk 8 dragonwell jobs one after the other. Jenkins didnt seem to complain about not being able to delete workspaces. The CYGWIN_WORKSPACE variable is still C:\Users\Jenkins\workspace for alibaba-1
Regarding the 2013 compiler on alibaba-2, jdk 8 hotspot can build fine. jdk 8 dragonwell exits with this error
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: Some input files use unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
27 errors
make[2]: *** [CompileJavaClasses.gmk:336: /cygdrive/c/cygwin/home/jenkins/openjdk-build/workspace/build/src/build/windows-x86_64-normal-server-release/jdk/classes/_the.BUILD_JDK_batch] Error 1
make[1]: *** [BuildJdk.gmk:64: classes-only] Error 2
make: *** [/home/jenkins/openjdk-build/workspace/build/src//make/Main.gmk:117: jdk-only] Error 2
Looking at https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/53/consoleFull I think that might be the same error occurring on one of the other build machines, so it could well be a problem in the codebase at the moment as opposed to a problem with that machine, so at least for now I wouldn't worry too much about that error.
Just an update:
It was identified that the alibaba machines are having the same problem as https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1662, in which the leftover _the..
file prevents jenkins from deleting the workspace before running its job. This has affected other windows boxes, hence the pr https://github.com/AdoptOpenJDK/openjdk-build/pull/2204, so I have put in a similar pr https://github.com/AdoptOpenJDK/openjdk-build/pull/2400.
Related issue https://github.com/AdoptOpenJDK/openjdk-build/issues/2205
I have also changed the CYGWIN_WORKSPACE
variable on both alibaba machines to C:\Jenkins\temp
since C:\Jenkins\workspace
results in the gradle error
17:00:23 Running gradle with /cygdrive/c/openjdk/jdk-11 at /cygdrive/c/workspace/openjdk-build/workspace/.gradle
17:00:23 Exception in thread "main" java.io.FileNotFoundException: \cygdrive\c\workspace\openjdk-build\workspace\.gradle\wrapper\dists\gradle-6.5-bin\6nifqtx7604sqp1q6g8wikw7p\gradle-6.5-bin.zip.lck (Access is denied)
Gut feel at this point is that it's a party length issue so I suspect any directory with 9 characters like "workspace" would have the issue. We could switch to C:\workspace which works be more consistent with what we have on the other machines
If you change the workspace variable to C:\workspace
, I think the build PR will be unnecessary, as it should be covered by https://github.com/AdoptOpenJDK/openjdk-build/blob/102237341c7f0737f0dd4dc57fcc7e9e3ffe3bd5/pipelines/build/common/openjdk_build_pipeline.groovy#L915 .
To reiterate my comments in the build pr, changing the workspace to C:\workspace\openjdk-build
(the rm command removes workspaces in C:\workspace\openjdk-build
not C:\workspace\
) caused the gradle error to appear again. It's possible that your gut feeling is right @sxa, since C:\Jenkins\temp
as a workspace seemed to work fine (I assume its fine if the total path exceeds 9 characters, so long as each directory doesnt?)
Yep it's the total length that'll make a difference, each individual component shouldn't matter too much (there may be a limit but I haven't hit it before in any normal scenario)
@Haroon-Khel If you set the workspace to just C:\workspace
does it work ok? I think openjdk-build
will get created as a subdirectory of the workspace dir so setting th workspace to c:\workspace\openjdk-build
definitely wouldn't have the deisred effect of taking advantage of the line suggested by Will
Setting it to C:\workspace
didnt work. It hit the same 'could not delete workspace error', (https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/79/console). I even tried C:/workspace
and it hit the same error.
Also, I dont think the openjdk-build
directory is created on its own as a subdirectory (but I may be wrong); by setting the workspace to C:\workspace
, the subdirectories that are created are C:\workspace\workspace\build ...
which therefore wont be deleted by the existing rm command, which deletes rm -rf C:/workspace/openjdk-build/workspace/build/src/build/*/jdk/gensrc
. Its only if I set the workspace to be C:\workspace\openjdk-build\
do I get C:\workspace\openjdk-build\workspace\build ...
(unless I am terribly mistaken).
(EDIT: I might be slightly mistaken, but I am certain the openjdk-build subdirectory does not get created automatically since I cant find it in any of the workspaces used aside from when I used C:\workspace\openjdk-build as the workspace)
Another note, I cant find a single windows machine in jenkins with the CYGWIN_WORKSPACE
variable to be C:\workspace
, so I am not sure why https://github.com/AdoptOpenJDK/openjdk-build/blob/102237341c7f0737f0dd4dc57fcc7e9e3ffe3bd5/pipelines/build/common/openjdk_build_pipeline.groovy#L915 was put there in the first place, unless those machines have since been decommissioned
https://github.com/AdoptOpenJDK/openjdk-build/issues/1855#issue-636830242 Ahh, Softlayer machines which we dont have anymore
AdoptOpenJDK/openjdk-build#1855 (comment) Ahh, Softlayer machines which we dont have anymore
They were replaced with the -ibmcloud-
ones.
Setting it to
C:\workspace
didnt work. It hit the same 'could not delete workspace error', (https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/79/console). I even triedC:/workspace
and it hit the same error.
We need to understand why - looks like you've changed it back to be the openjdk-build
subdirectory. I'm going to reset it and try again.
(EDIT: I might be slightly mistaken, but I am certain the openjdk-build subdirectory does not get created automatically since I cant find it in any of the workspaces used aside from when I used C:\workspace\openjdk-build as the workspace)
Worth remembering that setting it to something that long causes other problems, so we know that won't work.
Unclear why the default directory used for deletion included the openjdk-build
if that's the case though. Suspect we may want to adjust the defult so it excludes the openjdk-build
but it might be safer for now to just add a new delete as you've done in the PR.
Just running a test just now and I've queued up another one using c:\Jenkins\temp as per your PR and make it use your branch with the extra rm
. You have access to run that job if needed
The last run mentioned in the previous comment looks ok, as does a subsequent one on the machine from the same c:/Jenkins/temp
directory using your PR.
The build-ibmcloud machines use E:/jenkins/tmp as their CYGWIN_WORKSPACE
(hence the need for https://github.com/AdoptOpenJDK/openjdk-build/blob/102237341c7f0737f0dd4dc57fcc7e9e3ffe3bd5/pipelines/build/common/openjdk_build_pipeline.groovy#L919) so I dont think that rm
command is of any use anymore
Slightly confused by the last comment - you don't think which rm
command is of use?
Sorry, the command which deletes the C:\workspace\openjdk-build\ ...
directory, since I think this was specific to the softlayer machines
Yes agreed your new line could probably replace that one
Can this issue be closed now? Or do we want to use this issue to discuss the mysterious _the..
file?
Can this issue be closed now? Or do we want to use this issue to discuss the mysterious
_the..
file?
We haven't yet enabled all the tags to build etc. overnight so it is still "unavailable". I've ran a couple of test jobs though (https://ci.adoptopenjdk.net/view/Test_system/job/Test_openjdk15_j9_sanity.system_x86-64_windows/ 152 and 153 - one on each machine) and that looks ok, so I think I'll re-enable all the tags and let it run tonight on the JDK16 builds and see what happens. I'm also running https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-hotspot/907 on alibaba-1 to test VS2013 but it looks like it doesn't have a valid JDK7 boot dir configured
but it looks like it doesn't have a valid JDK7 boot dir configured
Im not sure why this is, just yesterday I ran a jdk8 dragonwell job which made it past that stage on alibaba-1 https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk8u/job/jdk8u-windows-x64-dragonwell/78/consoleFull
Dragonwell 8 does not use a JDK7 boot JDK. Hotspot does. We need this machine to be able to build all variants including HotSpot
Ok, that would explain it
Ive changed the JDK7_BOOT_DIR on alibaba-1 to /cygdrive/c/openjdk/jdk-7 (the - was missing). Done the same on alibaba-2
There was another problem showing up whereby the builds would fail if the jenkins agent was running as a service relative to starting it from a cygwin shell. The difference is that the default PATH
on the system had the Windows GIT client first, whereas the cygwin shell had it's own one first. Adjusting the system PATH
to have C:\cygwin\bin
first resolved that problem, therefore the machines now have it running correctly as a service.
@Haroon-Khel Ref the first of the job links above, it looks like alibaba-1 doesn't have cmake
on it - I thought you'd run most of the playbooks on it - is that not the case? https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9-windowsXL/635/console
JDK17/HS didn't run properly either :-(
23:22:58 Building targets 'product-images legacy-jre-image test-image' in configuration 'windows-x86_64-server-release'
23:22:58 Compiling 8 files for BUILD_TOOLS_LANGTOOLS
23:22:58 error: file not found: \cygdrive\c\jenkins\temp\workspace\build\src\build\windows-x86_64-server-release\buildtools\langtools_tools_classes\_the.BUILD_TOOLS_LANGTOOLS_batch.filelist
23:22:58 make[3]: *** [ToolsLangtools.gmk:37: /cygdrive/c/jenkins/temp/workspace/build/src/build/windows-x86_64-server-release/buildtools/langtools_tools_classes/_the.BUILD_TOOLS_LANGTOOLS_batch] Error 3
23:22:58 make[2]: *** [make/Main.gmk:74: buildtools-langtools] Error 2
23:22:58 make[2]: *** Waiting for unfinished jobs....
23:23:01
@Haroon-Khel Ref the first of the job links above, it looks like alibaba-1 doesn't have cmake on it - I thought you'd run most of the playbooks on it - is that not the case? https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9-windowsXL/635/console
Im certain I did. Will look into it
I ran the cmake role on -1. Though ansible said it installed cmake, I couldnt find it on the machine, nor did it update the path. So I manually installed cmake in Program Files\CMake
and added it to the path. I was unable to install it manually in cygwin/bin
due to the installer not having sufficient privileges (eventhough I was running it as the Administrator user)
It should be noted that ansible checks for an already installed cmake in cygwin64\bin
, while these machines have only a cygwin\bin
directory. Would it suffice simply to rename the directory to cygwin64
? Or must cygwin be reinstalled completely for this?
Rerunning a jdk11 openj9 job on -1. If it succeeds, ill install cmake this way onto -2 https://ci.adoptopenjdk.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-windows-x64-openj9/907/console
The job made it passed the cmake check, but failed with this
12:25:48 Compiling 13 properties into resource bundles for jdk.javadoc
12:25:49 /usr/bin/bash: /cygdrive/c/Program: No such file or directory
12:25:49 make[3]: *** [/cygdrive/c/Jenkins/temp/workspace/build/src/closed/OpenJ9.gmk:414: /cygdrive/c/Jenkins/temp/workspace/build/src/build/windows-x86_64-normal-server-release/vm/cmake.stamp] Error 127
12:25:49 make[2]: *** [/cygdrive/c/Jenkins/temp/workspace/build/src/closed/custom/Main.gmk:51: j9vm-build] Error 2
12:25:49 make[2]: *** Waiting for unfinished jobs....
12:25:49 Compiling 19 properties into resource bundles for jdk.compiler
12:25:49 Compiling 12 properties into resource bundles for jdk.jdeps
12:25:57
12:25:57 ERROR: Build failed for targets 'product-images legacy-jre-image test-image debug-image' in configuration 'windows-x86_64-normal-server-release' (exit code 2)
12:25:57
12:25:57 No indication of failed target found.
12:25:57 Hint: Try searching the build log for '] Error'.
12:25:57 Hint: See doc/building.html#troubleshooting for assistance.
12:25:57
12:25:57 make[1]: *** [/cygdrive/c/Jenkins/temp/workspace/build/src/make/Init.gmk:305: main] Error 2
12:25:57 make: *** [/cygdrive/c/Jenkins/temp/workspace/build/src/make/Init.gmk:186: product-images] Error 2
The 12:25:49 /usr/bin/bash: /cygdrive/c/Program: No such file or directory
suggests that its having trouble with the spacing in windows directory names
Program Files
(or the x86 equivalent) doesn't have a Short Name, would be my guess. See #1250 #1598 #1672 (and their referenced issues)
02/02/2021 08:07 PM <DIR> PROGRA~1 Program Files
01/14/2021 06:44 PM <DIR> PROGRA~2 Program Files (x86)
They do seem to have Shortnames enabled
This will prevent alibaba windows builds working as they are currently tied to these machines.