Open jpROC1 opened 8 months ago
Hi @jpROC1, what web portal you were using and failed summitting jobs with? And what's the complete command line you used and what's the complete error message? Could you provide a clearer description for your questions? Optionally with snapshots of your portal/error etc. It seems to me that you created a cluster with only one head node and you're expecting the cycle cloud auto-scaling helps you grow your HPC Pack cluster, right? Then have you enabled that option by cycle cloud when you were creating the HPC Pack cluster? Could you provide the options you used to create the HPC Pack cluster?
Hi @coin8086, we deployed a CycleCloud 8.6 machine and then used the built in template to get a HPC Pack Cluster, using the latest version of HPC pack. I was accessing both the head node through RDP and using the job manager software and accessing the web portal from a local machine. We did enable auto scaling, and I just submitted a job through the job manager that just ran dir on a target machine.
The web portal had all the options for the job submission greyed out
I have torn the cluster back down, but will put it back up to get some screenshots.
I am having a similar experience, it seems that the scheduled task is not running at all because of a missing module error:
python.exe : C:\cycle\hpcpack-autoscaler\.venvs\cyclecloud-hpcpack\Scripts\python.exe: Error while
finding module specification for 'cyclecloud-hpcpack.cli' (ModuleNotFoundError: No module named
'cyclecloud-hpcpack')
At C:\cycle\hpcpack-autoscaler\bin\azhpcpack.ps1:6 char:1
+ & python -m cyclecloud-hpcpack.cli @args *> c:\output.txt
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (C:\cycle\hpcpac...cloud-hpcpack'):String) [], RemoteE
xception
+ FullyQualifiedErrorId : NativeCommandError
I got this info after changing the PowerShell script line calling Python into this:
& python -m cyclecloud-hpcpack.cli @args *> c:\output.txt
I have enabled autoscaling when creating the HPC Pack cluster through CycleCloud.
After some digging on the HPC Pack headnode I found this:
PS C:\cycle\jetpack\system\bootstrap\hpcpack-autoscaler-installer> .\install.ps1
Directory: C:\cycle
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 5/16/2024 1:07 PM hpcpack-autoscaler
Directory: C:\cycle\hpcpack-autoscaler
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 5/16/2024 1:07 PM bin
Requirement already satisfied: pip in c:\cycle\hpcpack-autoscaler\.venvs\cyclecloud-hpcpack\lib\site-packages (24.0)
Processing c:\cycle\jetpack\system\bootstrap\hpcpack-autoscaler-installer\packages\argcomplete-1.12.2-py2.py3-none-any.whl
Processing c:\cycle\jetpack\system\bootstrap\hpcpack-autoscaler-installer\packages\certifi-2020.12.5-py2.py3-none-any.whl
Processing c:\cycle\jetpack\system\bootstrap\hpcpack-autoscaler-installer\packages\chardet-5.2.0-py3-none-any.whl
pip.exe : ERROR: charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl is not a supported wheel on this platform.
At C:\cycle\jetpack\system\bootstrap\hpcpack-autoscaler-installer\install.ps1:27 char:1
+ & pip install -U (get-item $PSScriptRoot\packages\*)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (ERROR: charset_... this platform.:String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
Generating config at : C:\cycle\jetpack\config\autoscale.json
python.exe : C:\cycle\hpcpack-autoscaler\.venvs\cyclecloud-hpcpack\Scripts\python.exe: Error while finding module specification for 'cyclecloud-hpcpack.cli' (ModuleNotFoundError: No module
named 'cyclecloud-hpcpack')
At C:\cycle\hpcpack-autoscaler\bin\azhpcpack.ps1:7 char:1
+ & python -m cyclecloud-hpcpack.cli @args
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (C:\cycle\hpcpac...cloud-hpcpack'):String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
It seems that the hpcpack-autoscaler-installer is trying to install a Linux whl file made for Python 3.10 on the Windows HPC Pack head node host running Python 3.8.8.
After replacing the whl file by hand the installer script works correctly. I used the whl from https://files.pythonhosted.org/packages/db/fb/d29e343e7c57bbf1231275939f6e75eb740cd47a9d7cb2c52ffeb62ef869/charset_normalizer-3.3.2-cp38-cp38-win_amd64.whl to verify this.
I have deployed a fresh cyclecloud 8.6 machine and used the built in template to deploy a Windows based headnode. The headnode comes up successfully however the node is listed as "offline". I manually turn the node to online and submit a job from the job manager but a node never gets spun up in cyclecloud.
The web portal has all the options for submitting jobs greyed out.
The only error I have seen is when trying to use the hpcpack cli and it has an error with not being able to find a python module for HPCPACK autoscale.