adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 101 forks source link

test-gdams-debian10-riscv64-1: tests hang after receiving Read-only file system message #2080

Open lumpfish opened 3 years ago

lumpfish commented 3 years ago

The systemtests hang on test-gdams-debian10-riscv64-1 after receiving a message:

Cannot contact test-gdams-debian10-riscv64-1: java.nio.file.FileSystemException: /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8/jenkins-result.txt: Read-only file system

Full output at time of the hang:

19:50:50  STF 19:50:50.022 - =========================   S T F   =========================
19:50:50  systemtest-prereqs has been processed, and set to: /home/jenkins/workspace/Grinder_TKG/jvmtest/system/systemtest_prereqsRetrieving amount of free space on drive containing /home/jenkins/workspace/Grinder_TKG/openjdk-tests/TKG/../TKG/output_1616615032575/MathLoadTest_all_CS_5m_0
19:50:50  There is 14087 Mb free
19:50:50  STF 19:50:50.060 - ==================   G E N E R A T I O N   ==================
19:50:50  STF 19:50:50.067 - Checking JVM: /home/jenkins/workspace/Grinder_TKG/openjdkbinary/j2sdk-image
19:50:50  STF 19:50:50.068 - Starting process to generate scripts: /home/jenkins/workspace/Grinder_TKG/openjdkbinary/j2sdk-image/bin/java  -Dlog4j.skipJansi=true -Djava.system.class.loader=net.adoptopenjdk.stf.runner.StfClassLoader -classpath /home/jenkins/workspace/Grinder_TKG/openjdk-tests/TKG/../../jvmtest/system/mathLoadTest/..//systemtest_prereqs/log4j/log4j-api.jar:/home/jenkins/workspace/Grinder_TKG/openjdk-tests/TKG/../../jvmtest/system/mathLoadTest/..//systemtest_prereqs/log4j/log4j-core.jar:/home/jenkins/workspace/Grinder_TKG/jvmtest/system/stf/stf.core/scripts/../bin net.adoptopenjdk.stf.runner.StfRunner -properties "/home/jenkins/workspace/Grinder_TKG/openjdk-tests/TKG/../TKG/output_1616615032575/MathLoadTest_all_CS_5m_0/20210324-195050-MathLoadTest/stf_parameters.properties, , /home/jenkins/workspace/Grinder_TKG/jvmtest/system/stf/stf.core/config/stf.properties" -testDir "/home/jenkins/workspace/Grinder_TKG/openjdk-tests/TKG/../TKG/output_1616615032575/MathLoadTest_all_CS_5m_0/20210324-195050-MathLoadTest"
19:58:16  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
19:58:16  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
19:58:16  Cannot contact test-gdams-debian10-riscv64-1: java.nio.file.FileSystemException: /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8/jenkins-result.txt: Read-only file system
20:03:18  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
20:03:18  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
20:08:20  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
20:08:20  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
20:13:22  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
20:13:22  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
20:18:24  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
20:18:24  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
20:23:26  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
20:23:26  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
20:28:28  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
20:28:28  (JENKINS-48300: if on an extremely laggy filesystem, consider -Dorg.jenkinsci.plugins.durabletask.BourneShellScript.HEARTBEAT_CHECK_INTERVAL=86400)
20:33:30  wrapper script does not seem to be touching the log file in /home/jenkins/workspace/Grinder_TKG@tmp/durable-5028f3b8
...........
Cancelling nested steps due to timeout
03:47:57  Could not connect to test-gdams-debian10-riscv64-1 to send interrupt signal to process
[Pipeline] sh
03:47:57  test-gdams-debian10-riscv64-1 was marked offline: This agent is offline because Jenkins failed to launch the agent process on it.

Sometimes the hang occurs trying to run the first test target - in other runs a few tests are run successfully before hanging. The specifci test target does not appear to be significant (a passed test in one run hangs in another).

sxa commented 3 years ago

Looks like the machines are both offline at the moment again :-( FYI @gdams

Haroon-Khel commented 3 years ago

Both machines are unpingable

karianna commented 3 years ago

@sxa - Does this need to be renamed as well? I think they're in your home rack?

sxa commented 3 years ago

I think they're in your home rack?

@karianna No they are the two that George still has. I have the third Unleashed board he sent to me as well as a Beagle-V.

sxa commented 2 years ago

@gdams What is your intention with these systems? They have been offline for around a year now.

sxa commented 1 year ago

@gdams You've indicated outside this issue on more than one occasion that you will bring these boards back online but that does not appear to be happening. Can you please clarify the current state of them, whether they will be brought back, and if not I would suggest that they should be relinquished so they can be brought online somewhere else in the project, particularly in light of the possibility of having more JDK releases that can be built on this platform in order to increase testing capacity.

sxa commented 1 year ago

Raised in Workgroup meeting - George has agreed to relinquish the two remaining baords for use elsewhere at the project.

sxa commented 1 year ago

Both boards are now with me and appear to be working. Will run some tests via jenkins to validate. The subject one is now live at https://ci.adoptium.net/manage/computer/test%2Dsxa%2Ddebian11%2Driscv64%2D1/ and running a build at https://ci.adoptium.net/job/build-scripts/job/jobs/job/evaluation/job/jobs/job/jdk20u/job/jdk20u-evaluation-linux-riscv64-temurin/18/console