SUSE / doc-hpc

Documentation for SLE-HPC
https://documentation.suse.com/sle-hpc/
2 stars 7 forks source link

Add Warewulf procedure #59

Closed tahliar closed 4 months ago

tahliar commented 1 year ago

Description

I've added a chapter for Warewulf. It now includes separate sections because the description got pretty long, plus there is now a section for advanced tasks. I included secure boot and local node storage, but not profiles and overlays because that would basically be reference material, and the warewulf.org docs on those are pretty decent already so I'd just be copying them. I've included links to those sections of the Warewulf docs in the For More Information section, and imo that's enough in this case.

Here's a PDF: cha-warewulf-deploy-nodes_en.pdf (updated May 22)

Are there any relevant issues/feature requests?

Which product versions do the changes apply to?

Note: Remove local node storage when cherry-picking to 15 SP5, as there are currently no plans to backport Ignition.

PR reviewer only: Have all backports been applied?

The doc team member merging your PR will take care of backporting to older documents. When opening a PR, do not set the following check box.

tahliar commented 1 year ago

@mslacken @e4t Here is the initial draft for Warewulf. Lines marked up with <remark> are questions/notes from me on things I wasn't sure about, or things that are waiting on the full functionality to be ready. I did some basic testing but nothing very deep.

mslacken commented 1 year ago

just a small observations (hopefully the others will correct me):

* it is preferable to run `wwctl configure hosts/--all` after you `wwctl node add` otherwise the hosts file isnt updated

Right, the overlays are not updated when a new node is added. I will open an upstream issue for this.

* I am not 100% sure about that but there is a small detail between oneline command to add a node and added each separately. I will demostrate this with two examples. `wwctl node add n[01-10] --netdev eth0 -I 192.168.10.110 --discoverable=true` vs `wwctl node add compute10 --netdev eth0 -I 192.168.10.111 --discoverable=true`

Could you elaborate more on this? What is the difference between adding multiple nodes and one node?

* `wwctl configure ssh` creates new keys if keys not exist already. Then they are copied to the nodes. i am not sure what the process to update them, in case of change

There isn't a documented process, but the keys are stored in /etc/warewulf/keys. If they are deleted there, the call of wwctl configure ssh will update them.

* i dont think i see somewhere that the controller needs two network interfaces.

No the controller doesn't need two network interfaces, its just the de facto setup in most HPC centers.

tahliar commented 6 months ago

@e4t @mslacken This is now ready for review again. I've re-tested the steps (aside from booting any actual nodes) and updated accordingly.

I may have missed something that's changed since the original draft, so let me know if there's anything that still needs updating.

tahliar commented 4 months ago

Hi @e4t @mslacken, I've finished adding advanced tasks (and updated the PDF in the Description), so this is ready for review again.