elitak / nixos-infect

[GPLv3+] install nixos over the existing OS in a DigitalOcean droplet (and others with minor modifications)
GNU General Public License v3.0
1.36k stars 221 forks source link

Loosing network on OVH VPS #18

Open charlycoste opened 7 years ago

charlycoste commented 7 years ago

I tried on a VPS 2016 SSD 3 from OVH, each of these :

And each time I run the script, then installation + reboot are going well. But after that, I can't connect to the VPS anymore. I accessed it by KVM to debug and it seems that the VPS just get disconnected from network.

elitak commented 7 years ago

The network detection part of the script is not very robust and assumes digitalocean's idiosyncrasies. Likely , it didn't grab the right settings for your host. I'll probably improve it sometime in the future, but for now, try the following to manually provision your hosts:

(if you have console access to an already-provisioned host, do only step 5, instead using the ip info provided in the OVH web UI, and then nixos-rebuild switch)

  1. copy over the nixos-infect script
  2. edit it
  3. comment out the last 4 lines (makeSwap to reboot)
  4. run source nixos-infect; set +e. This will generate the config files but not try to install everything yet.
  5. edit /etc/nixos/networking.nix, correcting any obvious errors. Use commands ip addr, ip route, and cat /etc/resolv.conf to obtain any missing info. Probably remove eth1 entirely.
  6. edit nixos-infect again, and uncomment the lines you commented
  7. bash -x nixos-infect

Post me the networking.nix contents, if you still can't get it working.

kniteli commented 6 years ago

Yeah, for OVH, don't even bother with the networking, just rip it out completely.

Change this:

  imports = [
    ./hardware-configuration.nix
    ./networking.nix # generated at runtime by nixos-infect
    $NIXOS_IMPORT
  ];

to this:

  imports = [
    ./hardware-configuration.nix
    $NIXOS_IMPORT
  ];

I just successfully installed doing that. Debian 9

elitak commented 6 years ago

I should probably add a --no-networking option that does this, or detect when the original system's network uses dhcp instead of manual config.

asymmetric commented 5 years ago

Is this still an issue, since we do this?

charlycoste commented 5 years ago

I'll try to check it out, then tell you if it's okay now.

charlycoste commented 5 years ago

@asymmetric Yes, this is still an issue.

TheSirC commented 5 years ago

I have big issues to make it run on master, does any of you (@asymmetric, @charlycoste, @kniteli ) would have a functioning version ? I could take care of piggy-backing on it to set up a PR for master to function on OVH.

asymmetric commented 5 years ago

Don't use OVH, sorry.

elitak commented 5 years ago

@TheSirC if you still need help, attach a .log here with the output after you set -x and run the script (comment out the reboot), because I have no detail on what your "big issues" are.

TheSirC commented 5 years ago

@elitak Yes, of course. Here is the error-log :

Error log ``` 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 11.4978 s, 93.4 MB/s swapon: /tmp/nixos-infect.rbzUv.swp: found swap signature: version 1d, page-size 4, same byte order swapon: /tmp/nixos-infect.rbzUv.swp: pagesize=4096, swapsize=1073741824, devsize=1073741824 ```

The problem lies in the fact that even with reboot command on the system does not reboot and the instance is not provided with usual NixOS commands (nix-env, nixos-rebuild, etc). I "bisected" that execution arrive to the makeConf function and just "don't execute it" (I can not find any trace of the commands in there leaving any traces on the system). The script exits with error code 1.

elitak commented 5 years ago

Run set -x, before you run the script, for more detail; that should get you the exact line that fails.

TheSirC commented 5 years ago

I actually added it to the script itself without further output. I added to the interactive session with this output :

root@address:~# ./nixos-infect
+ ./nixos-infect

And immediately returning to the interactive prompt.

TheSirC commented 5 years ago

After further testing (and multiple reinstalls of the VPS to make sure to work on a clean system each time) I found that :

  1. the script is stopping here, on the grep part; running the command myself sends back an empty string, totally normal the file is empty but does exist ! (that is considered as a fail for grep: it sends back error code 1).
  2. Here the script does not include a refresh of the packages list (e.g. apt-get update) which can make it fail.

Fun fact : The following commands do not produce the same output and I really would like to know why :

  1. bash -x script (<-- I ran this one to have output on the script)
  2. adding set -x to the script after the shebang
  3. doing set -x in your interactive prompt and then running ./script
TheSirC commented 5 years ago

So after applying patches for the above-cited issues I opened a pull-request that worked for me on OVH.

raspher commented 2 years ago

Resolved since 3 years....

Anyway i've similar problem on different budget provider. It's kinda weird -> vps does respond to ping but seems that it have all ports closed. Any ideas what's the problem?

Anyway if i've enought time, i'll try do step by step what this script does and play a little with config. Maybe #61 is the solution?