Open mohit84 opened 2 years ago
Ok seems 3 builders have been reinstalled (not sure when) or changed their SSH keys, and our ansible deployment was blocked.
Since the rest was working, I do not understand what happened exactly. I am fixing and running again, and will report.
ok so that was quick:
fatal: [builder-c7-2.aws.gluster.org]: FAILED! => {"changed": false, "msg": "No package matching 'dbench' found available, installed or updated", "rc": 126, "results": ["git-1.8.3.1-23.el7_8.x86_64 providing git is already installed", "sudo-1.8.23-3.el7.x86_64 providing sudo is already installed", "No package matching 'dbench' found available, installed or updated"]}
I guess we need to run another playbook first.
So, we have 8 builders on AWS instead of 4, and they all have been started by ansible, 1 day after the other.
This kinda messed the automation, so I will clean it up (eg, remove all instances, and run the playbook)
ok should be good now
Thanks Michael !!
So now, it fail with:
08:27:44 not ok 21 [ 126/ 120124] < 83> '0 check_common_secret_file' -> 'Got "1" instead of "0"'
08:27:44 cat: /var/lib/glusterd/geo-replication/primary_secondary_common_secret.pem.pub: No such file or directory
I wonder why it suddenly fail.
Some other test case was also failing specific to "No space left on device"
So in the mean time, I think I found why the server got reinstalled. Our automation detected the AMI changed (because it changed, but the email about that went to another inbox than mine), and so decided to reinstall the server, but without adjusting ansible (or not adjusting properly), which in turn made them half deployed.
It seems the test case (./tests/bugs/distribute/bug-882278.t ) is continuously failing due to not being able to resolve a hostname. I think we need to update an infra VM to run it successfully.
For more please refer this https://build.gluster.org/job/gh_centos7-regression/2729/consoleFull