ClusterLabs / anvil

The Anvil! Intelligent Availability™ Platform, mark 3
5 stars 6 forks source link

Server Power Controls are not working #630

Closed digimer closed 3 months ago

digimer commented 3 months ago

The UI isn't stopping/starting the VMs

ylei-tsubame commented 3 months ago

Confirmed power options won't work when server_host_uuid of target record in servers is blank. Under what situation would this be blank?

digimer commented 3 months ago

It should never be blank, the user was using the webui on a valid server. I was watching them try to shut down a server and it just never shut down. So I am guessing it wasn't being set by the UI. It was from the main dashboard, on the preview screenshot tile.

ylei-tsubame commented 3 months ago

This sounds like it's related to #607, which was supposed to be patched in #609

If the system includes patch #609 and it still doesn't work, it's a bug. Otherwise, the situation that failed in my (brief) tests is when the server doesn't have a server_host_uuid.

digimer commented 3 months ago

Just checked, this client is fully up to date with the RPMs from main.

ylei-tsubame commented 3 months ago

I just patched it to fallback to first subnode when server host uuid is null. According to the schema, the field is nullable.

Questions to narrow down the problem:

  1. Is the field null when the server is running?
  2. Did the power on/off job actually register?
  3. If the job did register, are the command and arguments correct?
  4. Were there errors in job_status?
digimer commented 3 months ago

There was a bug in the back end where the server state wasn't being parsed properly. That's fixed, and power off works now.

However power on fails. It needs to be assigned to one of the nodes, but it's currently being assigned to a Striker. Striker's can't boot servers, so the job just fails.

anvil=# SELECT a.job_uuid, b.host_name, a.job_command, a.job_data, a.job_progress, a.job_status, a.modified_date FROM jobs a, hosts b WHERE a.job_host_uuid = b.host_uuid AND a.job_command LIKE '%anvil-boot-server%';
               job_uuid               |            host_name            |         job_command         |                     job_data                     | job_progress | job_status |         modified_date         
--------------------------------------+---------------------------------+-----------------------------+--------------------------------------------------+--------------+------------+-------------------------------
 b5d43650-a5bd-4e25-af39-ba7871217658 | oo-striker03.opticaloutlets.com | /usr/sbin/anvil-boot-server | server-uuid=6e8a8d43-377a-4bdb-9f8e-f90ac36ba09d |          100 | job_0282  +| 2024-04-09 23:54:08.949797-04
                                      |                                 |                             |                                                  |              | error_0258 | 
(1 row)

"error_0258" is:

2024/04/09 23:54:08:[b5d43650]:anvil-boot-server:33; anvil-boot-server has started.
2024/04/09 23:54:08:[b5d43650]:anvil-boot-server:120; This host is not a node or DR, unable to boot servers.
digimer commented 3 months ago

Confirmed fixed