Azure / cyclecloud-hpcpack

CycleCloud project to enable use of the Microsoft HPC Pack job scheduler in Azure CycleCloud HPC clusters.
MIT License
4 stars 8 forks source link

VMs are deallocated and deleted too early #10

Open CamiloTerevinto opened 3 years ago

CamiloTerevinto commented 3 years ago

I've been working on a POC with Azure CycleCloud and HPC Pack 2019. From the head node, the auto-scaling configuration looks like this:

{
  "archivefile": "C:\\cycle\\jetpack\\config\\autoscaler_archive.txt",
  "boot_timeout": 1500,
  "cluster_name": "TEST-HPC",
  "default_resources": [],
  "disable_default_resources": false,
  "idle_timeout": 900,
  "lock_file": "C:\\cycle\\jetpack\\config\\scalelib.lock",
  "password": "*********",
  "statefile": "C:\\cycle\\jetpack\\config\\autoscaler_state.txt",
  "url": "https://172.17.10.4:9443",
  "username": "cyclecloud_access",
  "autoscale": {
    "start_enabled": true,
    "vm_retention_days": 7
  },
  "hpcpack": {
    "hn_hostname": "localhost",
    "pem": "C:\\cycle\\jetpack\\config\\hpc-comm.pem"
  },
  "logging": {
    "config_file": "C:\\cycle\\jetpack\\config\\autoscale_logging.conf"
  },
  "pbspro": {
    "read_only_resources": [
      "host",
      "vnode"
    ]
  }
}

However, when I start a new job, which starts a new node, what I see is:

What am I doing wrong, or what could be wrongly configured, that would result in this situation?

CamiloTerevinto commented 3 years ago

It's been twice now that a few hours after the last VM is deleted (again, before it should be deleted) that the entire scale set is deleted by the cyclecloud or HPC Pack VM.

CamiloTerevinto commented 3 years ago

Adding more information: