JanMa / nomad-driver-nspawn

A Nomad task driver for systemd-nspawn
MIT License
53 stars 16 forks source link

Can CSI work with systemd nspawn? #23

Closed MagicRB closed 3 years ago

MagicRB commented 3 years ago

I'm just wondering if it's possible to get CSI working with nspawn. Based on my limited knowledge about how CSI works, it should be possible since all that would need to be done would to bind mount the temporary directory created by the CSI driver to the correct place in the container.

Do you have any plans to implement this?

JanMa commented 3 years ago

Hi @MagicRB, thanks for your interest in my project. My knowledge about CSI plugins in Nomad is also somewhat limited and have not used them outside of a PoC. But as far as I can tell from the volume documentation, a CSI volume works the same as a host volume from a task driver's perspective. I implemented generic support for volumes in #7 so I think it should just work for CSI volumes too.

Kind regards, Jan

MagicRB commented 3 years ago

This is not related to this issue, but while I have you here, I need to ask you something. I'm using this task driver in combination with 1 prestart and 1 poststop raw_exec tasks to create a destroy a NixOS container which can then be started with this driver (if your systemd-nspawn version is patched). And I'd like to one day create a proper task driver from this, but writing a new one just doesn't make sense as this one can basically do what I need it to do, and my dedicated driver would just remove boilerplate. Could we somehow add the Nix code to this one? But also having Nix related things in a driver for nspawn might be considered "bloat". What's the best course of action in your opinion? I'd imagine that the defaults would be modified, like for example the bind mounts and boot = false and other things, when you enable the Nix support in this driver.

JanMa commented 3 years ago

Sure, I'd be happy to answer some more unrelated questions :wink: I do not use Nix or NixOS at all and also know very little about it. So I am a bit reluctant to add code for it to this driver. Also I am not sure I understand which features you would like to add to the driver. Can you maybe post your job file so I can get a better understanding of what it is you are doing? If it's just reducing boilerplate you're after, did you already have a look at the new HCL2 features which were added in Nomad 1.0? Using the new template functionality etc. you should be able to come up with some nice abstractions.

MagicRB commented 3 years ago

This is the job file that currently doesn't work due to raw_exec not supporting CSI and me not having the ability to restrict volumes to certain tasks, even though they're not used the raw_exec tasks... (if you could help me with this while you're here I'd really appreciate it, my Gitea is down currently due to this...) https://termbin.com/8z9b (it does work without the CSI volumes). Another example, this time fully working https://termbin.com/gvaz.

JanMa commented 3 years ago

I think I understand what it is you're doing in the job files. You are running a Nomad cluster on NixOs. In the prestart task you are using nix-build to build a NixOs image with Gitea installed. You would then like to run this image using the nspawn driver in the main task. Once the task is done, the poststop task removes this image again. To store the Gitea data, you want to make use of CSI volumes inside the nspawn task. The whole thing does not work because the raw_exec driver you are using for the prestart and poststop tasks does not support CSI. Please correct me if I understood something wrong.

As far as I can see there are two possible ways you could solve this:

Let me know how it turns out :-)

Regards Jan

MagicRB commented 3 years ago

Thanks and the second idea is actually perfect, you're right that systemd nspawn won't spawn a NixOS system, it has issues with the FSH non-compliance, but NixOS has a patched version (that I installed onto Arch, hackity, hack, hack). I'll test it soon, thanks for your help!

MagicRB commented 3 years ago

I need your help again, I've hit a wall and I can't figure out how to fix it. Apparently the container failed successfully... image

the /nix/store/... line means that indeed the system was built and placed where it's supposed to be, after that bash fails successfully.

this is what I got from nomad

Feb 10 20:36:41 blowhole nomad[5900]:     2021/02/10 20:36:41.540360 [INFO] (runner) rendered "(dynamic)" => "/var/lib/nomad/alloc/374c28c2-74f0-ce2e-2ba0-50b939b0f9ec/app-prestart/local/prestart.sh"
Feb 10 20:36:41 blowhole nomad[5900]:     2021/02/10 20:36:41.595757 [INFO] (runner) rendered "(dynamic)" => "/var/lib/nomad/alloc/374c28c2-74f0-ce2e-2ba0-50b939b0f9ec/app-prestart/local/default.nix"
Feb 10 20:36:41 blowhole nomad[5900]:     2021-02-10T20:36:41.833+0100 [INFO]  client.driver_mgr.nomad-driver-nspawn: starting nspawn task: driver=nspawn @module=nspawn driver_cfg="{Bind:map[/nix/var/nix/gcroots:/nix/var/nix/gcroots /nix/var/nix/profiles:/nix/>Feb 10 20:36:41 blowhole nomad[5900]:     2021-02-10T20:36:41.833+0100 [INFO]  client.driver_mgr.nomad-driver-nspawn: commad arguments: driver=nspawn args=[-D, /nomad-nspawn-empty-dir, --ephemeral, --network-veth, --as-pid2, --machine, app-prestart-374c28c2-74>Feb 10 20:36:43 blowhole nomad[5900]:     2021-02-10T20:36:43.375+0100 [ERROR] client.driver_mgr.nomad-driver-nspawn: failed to get machine addresses: driver=nspawn error="failed to call dbus: No machine 'app-prestart-374c28c2-74f0-ce2e-2ba0-50b939b0f9ec' know>Feb 10 20:36:43 blowhole nomad[5900]:     2021-02-10T20:36:43.375+0100 [ERROR] client.driver_mgr.nomad-driver-nspawn: systemd-nspawn failed: driver=nspawn @module=nspawn file=app-prestart.stderr.0 out="Container app-prestart-374c28c2-74f0-ce2e-2ba0-50b939b0f9e>Feb 10 20:36:43 blowhole nomad[5900]: [281B blob data]
Feb 10 20:36:43 blowhole nomad[5900]:     2021-02-10T20:36:43.377+0100 [WARN]  client.driver_mgr.nomad-driver-nspawn: received EOF, stopping recv loop: driver=nspawn @module=nspawn.executor.stdio err="rpc error: code = Unavailable desc = transport is closing" >Feb 10 20:36:43 blowhole nomad[5900]:     2021-02-10T20:36:43.754+0100 [ERROR] client.alloc_runner.task_runner: running driver failed: alloc_id=374c28c2-74f0-ce2e-2ba0-50b939b0f9ec task=app-prestart error="rpc error: code = Unknown desc = systemd-nspawn failed>Feb 10 20:36:43 blowhole nomad[5900]:     2021-02-10T20:36:43.755+0100 [INFO]  client.alloc_runner.task_runner: not restarting task: alloc_id=374c28c2-74f0-ce2e-2ba0-50b939b0f9ec task=app-prestart reason="Error was unrecoverable"
Feb 10 20:36:44 blowhole nomad[5900]:     2021/02/10 20:36:44.073580 [INFO] (runner) stopping
Feb 10 20:36:44 blowhole nomad[5900]:     2021/02/10 20:36:44.073642 [INFO] (runner) received finish
JanMa commented 3 years ago

I think I know what the issue is. Your bash script just exits too fast. After the driver starts a container, it can take a couple of seconds until it is able to gather information about the container via dbus. If the container exits while the driver is still trying to gather information about it, the driver sees it as failed although it finished successfully. What you can do to avoid this, is to put some sleep 20 in the end of your bash script. I am aware that this is not ideal but as a workaround it should do. I have not been able to figure out reliably if a container exited successfully before the fingerprinting is done, that's why the issue is still present.

MagicRB commented 3 years ago

_profile_dir=/nix/var/nix/profiles/nomad/${NOMAD_GROUP_NAME}-app-${NOMAD_ALLOC_INDEX}
_gcroots_dir=/nix/var/nix/gcroots/nomad/${NOMAD_GROUP_NAME}-app-${NOMAD_ALLOC_INDEX}

if [[ ! -d ${_profile_dir} ]]
then
    /nix/var/nix/profiles/nomad/builder-pin/bin/mkdir -p "${_profile_dir}"
else
    echo "${_profile_dir} exists when it shouldn't! Exiting..." ; echo exit 1
fi

if [[ ! -d ${_gcroots} ]]
then
    /nix/var/nix/profiles/nomad/builder-pin/bin/mkdir -p "${_gcroots_dir}"
else
    echo "${_gcroots_dir} exists when it shouldn't! Exiting..." ; echo exit 1
fi

NIX_REMOTE=daemon /nix/var/nix/profiles/default/bin/nix-build "${NOMAD_TASK_DIR}/default.nix" -o "${_profile_dir}/system"
/nix/var/nix/profiles/nomad/builder-pin/bin/sleep 30

This bash script doesn't work either, it does wait for 30s but still nothing, not even with 60s

JanMa commented 3 years ago

Can you please start Nomad with debug logs enabled and send me the output. This might help to understand what's failing

JanMa commented 3 years ago

@MagicRB do you still need help with this problem? Otherwise I'll close this issue

MagicRB commented 3 years ago

yup, sorry for not responding. I've been fighting with every piece of the stack imaginable to get my homelab working.. this has dropped way down my priority list

JanMa commented 3 years ago

Alright, I'll leave the issue open then. If you get back to the nspawn issue, just ping me again :-) One idea I had while looking at the bash script you posted: Try to run the task with network_veth = false set in the driver config. This will make the container use host networking and the driver will not fail the task if it cannot find an IP belonging to the container.

JanMa commented 3 years ago

Closing this issue due to inactivity.