Open vineel96 opened 1 year ago
Could you please send the logs and details on how/where you are running?
Hi @lezzidan , Hardware info:
Command 2: export ComputingUnits=8 runcompss kmeans_dislib.py observation: No task could be scheduled to any of the available resources, shutting down COMPSs
htop output: randomly some cores is getting used at different instances
Hi, I suspect that the default ComputingUnits in the resources.xml of COMPSs is set to only 4 cores.
Try looking into this file /opt/COMPSs//Runtime/configuration/xml/resources/default_resources.xml
and change <ComputingUnits>4</ComputingUnits>
to 16
.
Also you can also try export ComputingUnits=1
.
You mentioned Dataset size: 236930 x 14 but what block size are you using? Because that will determine the number of tasks that will be launched in parallel.
Hi @cTatu, I have changed computingunits value to 16 in default_resources.xml. The error remained same "No task could be scheduled to any of the available resources, shutting down COMPSs" or the program gets hanged for long time. Also i tried setting "export ComputingUnits=1", same issue persists. I have tried two block sizes: 1. (229616,7) and 2. (2,2) For these two block sizes the error remained same where program gets hanged or it says "no task can be scheduled, shutting down COMPs"
Hi @cTatu, @lezzidan, Can i get any suggestions/help regarding the issue mentioned?
Hey sorry for the delay,
One possible thing could be that the ssh-daemon is not started. COMPSs needs ssh access to the worker node (which in your case is the same machine). So to check that try executing ssh localhost
and it should be configured in a password-less way (using rsa keys). Make sure the service is installed sudo apt install openssh-server
and that is on sudo service ssh start
.
For passworless configuration you can follow our guide: https://compss-doc.readthedocs.io/en/stable/Sections/01_Installation/05_Additional_configuration.html
Hope this works Best regards
Hi @cTatu , Is it possible to run dislib on single node system with 8 core cpu?(as i am getting non-reachable nodes error when running on single node) and also will the performance boost remain same?