Azure / hpcpack

The repo to track public issues for Microsoft HPC Pack product.
MIT License
29 stars 11 forks source link

Jobs do not get scheduled even though there are available resources on the cluster #9

Open syagev opened 3 years ago

syagev commented 3 years ago

Problem Description

Jobs do not get scheduled even though there are available resources on the cluster.

Steps to Reproduce

Submit a job when the cluster is at relatively high load (>~80% of resources are utilized).

Expected Results

If there are resources that meet the job's requirements at least some of it tasks should start.

Actual Results

The job remains in queued state despite there being available resources in the cluster.

Additonal Comments

Surprisingly, this phenomenon happens both for Queued and Balanced mode. Even more surprisingly, momentarily switching the Scheduling Policy to the other option and back (regardless of the original mode) solves this issue and the queued jobs will get scheduled.

YutongSun commented 3 years ago

Not sure if this is still an issue. Please collect the HPC Pack version info and HpcScheduler service logs. Send to hpcpack@microsoft.com for further investigation.

syagev commented 3 years ago

I will when we see the bug reoccurring. Thanks