bcl update on `rally-leg-crystal` did not remove old nodes once the new nodes were syncd/healthy

bloq / cloud-cli

☁️💻

MIT License

0 stars 0 forks source link

bcl update on `rally-leg-crystal` did not remove old nodes once the new nodes were syncd/healthy #169

Closed daniel-bloq closed 3 years ago

daniel-bloq commented 3 years ago

I ran bcl cluster update on the algo cluster in af-south-1 and the new nodes show they are healthy in the LB, and have been for some time now, but the old nodes are still in pace and have not been removed by the process as I would expect.

Need you to take a look to see what the problem is.

Algorand Foundation - Algorand Mainnet Relay mainnet rally-leg-crystal cluster-54b377fb-27ab-45b7-b3d2-38ff4deed196

bcl clusters update -c 3 -o 3 -i cluster-7a5660a5-2561-48d2-b924-c4aecca68d3b -s service-5aaa0024-561b-5871-ab21-21316f6c5e7b

ℹ Updating cluster                                                                                                                                                                                  15:18:38
? You will update the cluster's service to service-5aaa0024-561b-5871-ab21-21316f6c5e7b and capacity to 3 total and 3 on-demand. Do you want to continue? Yes
⠧ ✔ Cluster cluster-7a5660a5-2561-48d2-b924-c4aecca68d3b updated successfully

gabmontes commented 3 years ago

The update process failed with this error:

2021-03-22T20:19:53.244Z - [31merror[39m: Failed updating cluster cluster-7a5660a5-2561-48d2-b924-c4aecca68d3b: Target groups 'arn:aws:elasticloadbalancing:af-south-1:477041904397:targetgroup/rally-leg-crystal-4160/52cc17fd24e464da' not found

gabmontes commented 3 years ago

Since the error was originated in an inconsistency between the database record and the cluster setup, it is not an issue with the CLI itself. Closing.

daniel-bloq commented 3 years ago

So what would have caused this inconsistency and in which repo should I open a ticket for that if not this one?

gabmontes commented 3 years ago

@daniel-bloq if some manual update was done to that cluster, this means doing something through the AWS console, that may have caused the problem.

I could not find more data to investigate further in that direction so let's keep the eyes open.

If there were an issue with i.e. the update process, an issue should be filed in cloud-nodes. We could file an issue here and transfer it there too.

daniel-bloq commented 3 years ago

@gabmontes Nothing was done manually to the cluster itself. I only manage the clusters with bcl.

@jcvernaleo Did go in after the update process ought to have removed the old cluster and he updated the Algo GUID in the algo docker containers inside the nodes.

gabmontes commented 3 years ago

Just for the record, there were several manual updates to this cluster in the past:

https://github.com/bloqpriv/DevOps/issues?q=is%3Aissue+rally-leg-crystal

daniel-bloq commented 3 years ago

@gabmontes So if @jcvernaleo or I go in and updates the client or increase the disk size that should not break the update process yea? It was just that early TG mod that was a problem yes?