adoptium / aqa-tests

Home of test infrastructure for Adoptium builds
https://adoptium.net/aqavit
Apache License 2.0
131 stars 314 forks source link

Enable taking node temporarily offline due to specific machine issue in Adoptium #5730

Open sophia-guo opened 6 days ago

sophia-guo commented 6 days ago

Adding the parameter SLACK_CHANNEL to the configuration of https://ci.adoptium.net/view/Test_grinder/job/Test_Job_Auto_Gen/ can take node offline due to specfiic machine issues.

This issue opened to monitor any issues with this enabled.

16:36:43  Test_openjdk21_hs_sanity.external_x86-64_linux #36 result is FAILURE. Checking console log for specific errors...
Scripts not permitted to use new java.util.ArrayList. Administrators can decide whether to approve or reject this signature.
sophia-guo commented 2 days ago

test-azure-ubuntu2404-x64-1 was hit twice due to the No space left on device. It was not marked as offline as No space left on device was on the error lists https://github.com/adoptium/aqa-tests/pull/5731.

https://ci.adoptium.net/job/Test_openjdk21_hs_sanity.openjdk_x86-64_linux_testList_1/19/console

15:44:39  Exception: hudson.AbortException: Failed to run ssh-agent: mkdtemp: private socket dir: No space left on device
15:44:39  
[Pipeline] timeout

https://ci.adoptium.net/job/Test_openjdk21_hs_special.system_x86-64_linux/28/console

[Pipeline] echo
15:37:04  Exception: hudson.AbortException: Failed to run ssh-agent: mkdtemp: private socket dir: No space left on device
15:37:04  

Currently test-azure-ubuntu2404-x64-1 is marked offline. I believe it's marked offline by jenkins auto-offline machines that are low on space?@sxa is it marked offline by infra's scheduled task?. How would infra process this case?