Open steelhead31 opened 1 year ago
I am looking at all the systems - and as they are all recently cloned it appears there have been (undocumented?) changes to the system configurations.
When there are issues these should not be hacked at on the fly. There needs to be - at the minimum - reported in the issue what was done - and perhaps an update to the playbooks.
As an example: the size of 4G for /tmp was chosen because the test usedd to be smaller - and 4G was sufficient by nearly 2G. If the test is now doing 2x 2+G, obviously 4G is not going to work.
YET: when I look at the systems /tmp has not been increased, but /var has been increased on two systems.
I cannot second guess what needs to be done when changes are made on the fly.
So, no (known) action taken to resolve this issue. And it looks like it is just waiting to happen again - on different systems.
Seems to be affecting some, but not all systems: (note 100% used below).
root@osunim:[/root]dsh-adopt "/usr/bin/df -g /tmp"
adopt01:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 3.99 1% 47 1% /tmp
==============
adopt02:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 4.00 1% 44 1% /tmp
==============
adopt03:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 2.97 26% 1665 1% /tmp
==============
adopt04:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 3.40 16% 252 1% /tmp
==============
adopt05:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 3.99 1% 535 1% /tmp
==============
adopt06:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 0.00 100% 406 7% /tmp
==============
adopt07:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 0.00 100% 469 9% /tmp
==============
adopt08:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 3.99 1% 503 1% /tmp
==============
adopt10:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 4.00 3.99 1% 116 1% /tmp
==============
root@osunim:[/root]ssh adopt07 /usr/bin/du -sg /tmp/*.dat
2.00 /tmp/dst2848659901056789357.dat
1.99 /tmp/dst665158920248314980.dat
0.00 /tmp/src2464967498881328204.dat
root@osunim:[/root]ssh adopt06 /usr/bin/du -sg /tmp/*.dat
1.99 /tmp/dst16616608096706097137.dat
2.00 /tmp/dst6039054501113654116.dat
0.00 /tmp/src805118232336050098.dat
root@osunim:[/root]ssh adopt06 /usr/bin/ls -ls /tmp/*.dat
2087736 -rw-r--r-- 1 jenkins staff 2137837568 Aug 13 16:51 /tmp/dst16616608096706097137.dat
2097156 -rw------- 1 jenkins staff 2147484671 Aug 13 16:51 /tmp/dst6039054501113654116.dat
8 -rw------- 1 jenkins staff 2147484671 Aug 13 16:51 /tmp/src805118232336050098.dat
root@osunim:[/root]ssh adopt07 /usr/bin/ls -ls /tmp/*.dat
2097160 -rw------- 1 jenkins staff 2147484671 Aug 13 16:08 /tmp/dst2848659901056789357.dat
2089732 -rw-r--r-- 1 jenkins staff 2139873280 Aug 13 16:10 /tmp/dst665158920248314980.dat
8 -rw------- 1 jenkins staff 2147484671 Aug 13 16:08 /tmp/src2464967498881328204.dat
junk
root@osunim:[/root]dsh-adopt "/usr/bin/df -g /tmp"
adopt01:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 47 1% /tmp
==============
adopt02:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 39 1% /tmp
==============
adopt03:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 3.97 21% 1388 1% /tmp
==============
adopt04:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.40 13% 235 1% /tmp
==============
adopt05:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 242 1% /tmp
==============
adopt06:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 304 1% /tmp
==============
adopt07:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 433 1% /tmp
==============
adopt08:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 281 1% /tmp
==============
adopt10:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 4.99 1% 111 1% /tmp
==============
Looks like there may still be an artifact:
adopt07:
Filesystem GB blocks Free %Used Iused %Iused Mounted on
/dev/hd3 5.00 0.99 81% 632 1% /tmp
Looking like it is not cleaning up properly?
root@adopt07:[/root]ls -ltr /tmp/*.dat
-rw------- 1 jenkins staff 2147484671 Aug 20 17:01 /tmp/src4056949266194807527.dat
-rw------- 1 jenkins staff 2147484671 Aug 20 17:02 /tmp/dst9520860984591320509.dat
-rw-r--r-- 1 jenkins staff 2147484671 Aug 20 17:03 /tmp/dst6282469117645280818.dat
root@adopt07:[/root]date
Wed Aug 23 12:06:51 UTC 2023
removed the files, and clearing jenkins.
Is the last console output still available?
Just wondering if this is a problem with the test.
4 -rw-r--r-- 1 jenkins staff 40 Aug 27 16:22 blah4255219114647392657.tmp
4 -rw-r--r-- 1 jenkins staff 151 Aug 26 17:33 unsigned.jar1450541346237646654jar
4 -rw-r--r-- 1 jenkins staff 305 Aug 26 15:27 test1723908656910621468.test
4 -rw-r--r-- 1 jenkins staff 383 Aug 27 17:39 test10517561616964218431.test
4 -rw-r--r-- 1 jenkins staff 403 Aug 27 17:39 test15750855502312122436.test
4 -rw-r--r-- 1 jenkins staff 403 Aug 27 17:39 test16807323090330678638.test
4 -rw-r--r-- 1 jenkins staff 1862 Aug 26 17:33 signed.jar8571074910892324627jar
4 -rw-r--r-- 1 jenkins staff 1974 Aug 26 17:33 signed2.jar1180279166009667648jar
4 -rw-r--r-- 1 jenkins staff 32007 Aug 27 16:22 source245824410068849651.tmp
4 -rw-r--r-- 1 jenkins staff 6442450960 Aug 27 16:23 source1323321727058409565.tmp
4 -rw-r--r-- 1 root system 6 Jun 20 10:57 rc.net.out
4 -rw-r--r-- 1 root system 24 Jun 20 11:08 NIM_instp_updt_list
4 -rw-r--r-- 1 root system 77 Jun 20 10:57 KrsctPHA.saved
4 -rw-r--r-- 1 root system 2124 Jun 20 10:57 ctrmc_MDdr.dbg
4 -rw-rw-r-- 1 root system 53 Jun 20 10:54 uncfgct.dbg
4 -rw-rw-r-- 1 root system 676 Jun 20 10:55 rsct_cfgct_history.log
8 -rw------- 1 jenkins staff 2147484671 Aug 27 16:22 src6628553441366702378.dat
136 -rw-r--r-- 1 jenkins staff 138481 Aug 27 13:50 hs_err_pid19726826.log
200 -rw-rw-r-- 1 root system 204800 Aug 29 15:00 lvmt.log
1024 -rw-r--r-- 1 jenkins staff 1048576 Aug 27 16:22 blah17126132827369752914.tmp
2097156 -rw------- 1 jenkins staff 2147484671 Aug 27 16:22 dst1075097809748986483.dat
2097160 -rw-r--r-- 1 jenkins staff 2147484671 Aug 27 16:23 dst2207767882312241704.dat
Needs to be examined further to determine in a clear environment, and ideally to narrow down which tests in the external suites are causing the problem.
OpenJDK have discussed test cases not always cleaning up after themselves.
The JDK17 extended test suites failed when running on the test-osuosl-aix72-ppc64-5 due to filling /tmp.
The error can be seen here ( as well as in Nagios ) https://ci.adoptium.net/job/Test_openjdk17_hs_extended.openjdk_ppc64_aix_testList_2/
The test job appeared to create 2 x 2.1 GB tmp files in /tmp filling the entire file system, and causing tests to fail.