clustervision / trinityX

TrinityX is the new generation of ClusterVision's open-source HPC, A/I and cloudbursting platform. It is designed from the ground up to provide all services required in a modern HPC and A/I system, and to allow full customization of the installation.
GNU General Public License v3.0
67 stars 37 forks source link

/proc is not mounted this is not a suppored mode of operation - when provisioning nodes #421

Closed trick-1 closed 3 months ago

trick-1 commented 4 months ago

I have just stood up a new environment

Controller: Rocky Linux 9.4 Installed trinityX as per instructions

When I provision nodes using an image generated from the compute-default.yml image they fail. Reviewing the journalctl -xe on the node I see the following.

image

Appreciate suggestions or insight.

Regards

Richard

trick-1 commented 4 months ago

further to this sysroot mount fails as follows

image
aphmschonewille commented 4 months ago

could you supply me with the following information please:

trick-1 commented 4 months ago

[root@trinityx ejb]# luna node show node001 -R { "name": "node001", "kerneloptions": null, "switchport": null, "service": false, "setupbmc": true, "status": "Luna installer: success", "comment": null, "roles": "None", "vendor": "QEMU", "assettag": "Not Specified", "prescript": "", "partscript": "mount -t tmpfs tmpfs /sysroot\r\n", "postscript": "echo 'tmpfs / tmpfs defaults 0 0' >> /sysroot/etc/fs "netboot": "True", "localinstall": "False", "bootmenu": "False", "provision_interface": "BOOTIF", "provision_method": "torrent", "provision_fallback": "http", "tpm_uuid": null, "tpm_pubkey": null, "tpm_sha256": null, "unmanaged_bmc_users": null, "group": "compute", "osimage": "compute", "osimage_source": "group", "bmcsetup": "compute", "bmcsetup_source": "group", "osimagetag": "computeimage", "osimagetag_source": "group", "switch": null, "setupbmc_source": "group", "netboot_source": "group", "localinstall_source": "default", "bootmenu_source": "default", "roles_source": "default", "provision_method_source": "cluster", "provision_fallback_source": "cluster", "provision_interface_source": "default", "prescript_source": "default", "partscript_source": "group", "postscript_source": "group", "hostname": "node001.trix", "interfaces": [ { "interface": "BOOTIF", "ipaddress": "10.141.0.1", "macaddress": "bc:24:11:7e:b7:3b", "network": "trix" }, { "interface": "BMC", "ipaddress": "10.148.0.1", "macaddress": null, "network": "ipmi" } ] } [root@trinityx ejb]# luna group show compute -R { "name": "compute", "setupbmc": "True", "domain": "cluster", "kerneloptions": null, "prescript": "", "partscript": "mount -t tmpfs tmpfs /sysroot\r\n", "postscript": "echo 'tmpfs / tmpfs defaults 0 0' >> /sysroot/etc/fs "netboot": "True", "localinstall": "False", "bootmenu": "False", "comment": "", "roles": null, "provision_interface": "BOOTIF", "provision_method": "torrent", "provision_fallback": "http", "unmanaged_bmc_users": "", "interfaces": [ { "interface": "BOOTIF", "network": "trix" }, { "interface": "BMC", "network": "ipmi" } ], "setupbmc_source": "group", "netboot_source": "group", "localinstall_source": "default", "bootmenu_source": "default", "provision_interface_source": "default", "provision_method_source": "cluster", "provision_fallback_source": "cluster", "prescript_source": "default", "partscript_source": "group", "postscript_source": "group", "osimage": "compute", "bmcsetupname": "compute", "osimagetag": "computeimage", "osimage_source": "group", "bmcsetupname_source": "group", "osimagetag_source": "group" }

-----PlayBook----- [root@trinityx site]# more compute-default.yml

---last 4 lines of group-vars/all.yml--

trix_version: "14.1" workload_manager: slurm

DO NOT REMOVE: yml check: 140103

--- file set ---- I need to work out a way to get the files off the node....

trick-1 commented 3 months ago

OK so I rebuilt my cluster using these instructions (the ones in the github just kept creating the same issue) and I have moved on. https://supercomputing.tue.nl/documentation/administration/trinityx/installation/