NixOS / nixops-hetzner

GNU Lesser General Public License v3.0
48 stars 14 forks source link

Hetzner: nixops provides no way out when the machine is stuck in boot #16

Open nh2 opened 7 years ago

nh2 commented 7 years ago

Edit: See workaround.


I have a situation with nixops where I set up a Hetzner but specified a nonexistent device for a partition.

Now it's stuck in Starting / Obsolete.

I don't seem to be able to tell it to start anew from rescue mode.

Is there functionality in Hetzner to handle this case, without wiping the entire deployment? For example, I think just removing a given machine from the list would make it go through rescue mode on next deploy.


Also, I can't destroy in this case:

% ./ops destroy -d mydeployment
error: Multiple exceptions: please either set 'deployment.hetzner.robotUser' or $HETZNER_ROBOT_USER for machine 'machine-1', please either set 'deployment.hetzner.robotUser' or $HETZNER_ROBOT_USER for machine 'machine-2'

But I have set deployment.hetzner.robotUser.

nh2 commented 7 years ago

CC @aszlig

nh2 commented 7 years ago

As a workaround, I just dropped machine-1 and machine-2 from the nixops sqlite DB using sqlitebrowser (from the Resources table).

nh2 commented 7 years ago

Workaround using sqlite3

nix-shell -p sqlite

(I recommend running rlwrap sqlite3 in the below so that arrow keys work.)

Deleting a specific machine

To delete (forget) mymachine from mydeployment:

sqlite3 localstate.nixops
DELETE FROM ResourceAttrs WHERE machine = (SELECT id FROM Resources WHERE name = 'mymachine' AND deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment'));
DELETE FROM Resources WHERE name = 'mymachine' AND deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment');

Deleting all attributes (including all machines!)

To delete everything in mydeployment:

sqlite3 localstate.nixops
DELETE FROM ResourceAttrs WHERE machine = (SELECT id FROM Resources WHERE deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment'));
DELETE FROM Resources WHERE deployment = (SELECT deployment from DeploymentAttrs WHERE name = 'name' AND value = 'mydeployment');
aszlig commented 7 years ago

@nh2: Does it work with nixops destroy --include machine_you_want_to_destroy?

nh2 commented 7 years ago

@aszlig No, same error.

Also note I'm using an "admin user" account and I'm not even sure what destroy should do exactly for Hetzner.

But in any case, it seems we're not even getting to the point where that is relevant, as it seems to fail before that.

aszlig commented 7 years ago

Ah, sorry... you did that in the first place. The reason this doesn't work is because it's using ROBOT_USER/ROBOT_PASS to access the robot (which apparently weren't passed), remove the vm_id from the server and reboot into rescue (with --wipe it also uses shred to erase the disks).

nh2 commented 7 years ago

@aszlig Hmm, from your answer I'm not sure if I can conclude it already: Should this work also with an admin user, or does that only work with the main Hetzner account?

coretemp commented 6 years ago

@nh2 Still using your workaround?

nh2 commented 6 years ago

@coretemp Yes.