Open llxia opened 1 month ago
These tests were added around one and half years ago. As it's dev level may not run frequently. I didn't notice there is this issue. Check recent jdk21 seems no this issue.
https://ci.adoptium.net/view/Test_openjdk/job/Test_openjdk21_hs_dev.openjdk_x86-64_linux/
We should mark the node offline automatically when there is an error Cannot delete workspace: Unable to delete ...
Is there any other way to clean up the crash files in the test code itself instead of marking it offline @llxia
Is there any other way to clean up the crash files in the test code itself instead of marking it offline @llxia
llxia is on vacation
Related: https://stackoverflow.com/questions/42423999/cant-delete-file-created-via-docker
I think we also need to know why this happens. Does it only happen when impl=openj9|ibm as no issue reported with jdk_container running against impl=hotspot.
Normally this permission issue happens if you run things as root inside the container while using a mapped volume from the host inside the container. The jdkcontainer tests map volumes options are like `--volume /home/jenkins/workspace/jenkinsjobname/aqa-tests/TKG/output***/jdk_container_0/work/classes/2/....`, which is not the workdir. So shouldn't have this issue. Is there something specific to openj9|ibm caused this?
Is there any updates on this issue? Is @llxia back from vacation?
I think we also need to know why this happens. Does it only happen when impl=openj9|ibm as no issue reported with jdk_container running against impl=hotspot.
If I had to guess, it happens when a testcase fails and doesn't cleanup after itself, then the workspace can not be deleted. So @AswathySK perhaps check if that is the case and exclude the failing testcases.
Lan is not back from vacation and no one is pursuing this issue further at this time. I suggest you dig in to answer some of the questions in this issue if you are interested in a different approach than taking the machine offline.
Just a note that PR of making the node offine has also been reverted, which might help @AswathySK your investigation?
@smlambert , when a test case fails it is not able to clean up after since the files created when it crashes are owned by root user. And yes I will do some more investigation on which all test cases we are seeing this issue.
So my point is, the reason we do not have a cleanup problem for Temurin is that there is not a failing/crashing testcase.
So your first task would be to see which testcase is crashing/failing, triage it by gathering any extra data you can, report the issue in the openj9 repo if it doesn't already exist, and exclude the test in the ProblemList files while the issue is being investigated and fixed by the openj9 team.
jdk_container
left files on the host machine that are owned by root. These files cannot be cleaned by Jenkins job. It causes Jenkins job to fail.@sophia-guo @smlambert do you also see a similar issue at Adoptium Jenkins? Is there a better way to resolve this?