Closed msaju closed 3 years ago
This issue is actually fixed in https://github.com/gluster/project-infrastructure/issues/93
Root cause: centos8-regression has started taking more than 450 minutes since few days. No idea what has introduced this. After 450 mins the build usually get aborted and so no cleanup did happen and hence we see the leftovers again.
I have cleaned up the workspace and did trigger another build. We need to investigate why it is taking so long? I do not see it stuck anywhere though.
See also https://github.com/gluster/project-infrastructure/issues/102 with long running job. Not sure what's the cause.
I submitted a PR for the cleanup
A new failure happened: https://build.gluster.org/job/centos8-regression/119/console
The root cause is insufficient space on bricks. The test creates a 4 GiB file that is then migrated by rebalance.
The logs from rebalance show these errors:
[2020-11-09 17:51:06.502608 +0000] I [dht-rebalance.c:1537:dht_migrate_file] 0-patchy-dht: /dir1/bar: attempting to move from patchy-client-1 to patchy-client-0
[2020-11-09 17:51:06.504356 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:1719:client4_0_fallocate_cbk] 0-patchy-client-0: remote operation failed. [{errno=28}, {error=No space left on device}]
[2020-11-09 17:51:06.504388 +0000] E [MSGID: 109023] [dht-rebalance.c:732:__dht_rebalance_create_dst_file] 0-patchy-dht: fallocate failed for /dir1/bar on patchy-client-0 [No space left on device]
[2020-11-09 17:51:06.504652 +0000] E [MSGID: 0] [dht-rebalance.c:1693:dht_migrate_file] 0-patchy-dht: Create dst failed on - patchy-client-0 for file - /dir1/bar
[2020-11-09 17:51:06.505302 +0000] E [MSGID: 109023] [dht-rebalance.c:2862:gf_defrag_migrate_single_file] 0-patchy-dht: migrate-data failed for /dir1/bar [No space left on device]
[2020-11-09 17:51:06.507397 +0000] I [MSGID: 109028] [dht-rebalance.c:4690:gf_defrag_status_get] 0-patchy-dht: Rebalance is completed. Time taken is 0.00 secs
This error already happened in the past and it was caused by small bricks, but I thought brick size had been increased since then.
Any idea what's happening ?
@xhernandez This happened because the job did run on a builder that has 5GB /d space. The builder is builder212.int.aws.gluster.org and has the label 'centos8-testing'. This label was used in the job config and is fixed by this commit https://github.com/gluster/build-jobs/commit/329b97b43a732fcf53a186331ea981676ae8b609. The new centos8 builders with label 'centos8' do have sufficient space and now the job will pick these new ones.
We might need to manually increase the size of the brick on 212. We will take care of that.
Thanks Deepshikha. Is it possible to manually trigger the job, lets see if anything else fails.
Thanks a lot Xavi and Deepshikha for the support. The Centos 8 is passing now.
So while we had some issue (likely related to resolv.conf and AWS on reboot), the test suite work. We are still looking at the DNS issue (I guess that's just "new NM + new cloud init"), but in the mean time, i am closing that one
Centos-8 regression is failing with below error. Also: java.nio.file.FileSystemException: /home/jenkins/root/workspace/centos8-regression/tests/utils/pycache/libcxattr.cpython-36.pyc: Operation not permitted.