It4innovations / hyperqueue

Scheduler for sub-node tasks for HPC systems with batch scheduling
https://it4innovations.github.io/hyperqueue
MIT License
272 stars 21 forks source link

hq job info output mode differences #577

Closed svatosFZU closed 1 year ago

svatosFZU commented 1 year ago

Hi, I am using "hq job info" to get some basic information about a job. I am using --output-mode json to make further processing easy (and because it contains start and finish timestamps). Unfortunately, there are some values available in cli mode which are not in json mode (and vice versa). To be specific, I miss the Resources (number of cores, etc.) and Workers (worker node name) info which is in cli but not in json. Would it be possible to add them also into the json mode? Also, if it would be possible to add ID of PBS/SLURM allocation into the info, that would be great.

hq job info --output-mode cli 7000
+----------------------+-------------------------+
| ID                   | 7000                    |
| Name                 | bash                    |
| State                | FINISHED                |
| Tasks                | 1; Ids: 0               |
| Workers              | cn601.karolina.it4i.cz  |
| Resources            | cpus: 32 compact        |
| Priority             | 0                       |
| Command              | bash                    |
| Stdout               | <None>                  |
| Stderr               | <None>                  |
| Environment          |                         |
| Working directory    | /home/svatosm           |
| Task time limit      | None                    |
| Crash limit          | 1                       |
| Submission date      | 2023-04-17 03:24:00 UTC |
| Submission directory | /home/svatosm           |
| Makespan             | 6h 22m                  |
+----------------------+-------------------------+
hq job info --output-mode json 7000
{
  "crash_limit": 1,
  "finished_at": "2023-04-17T09:46:56.088693387Z",
  "info": {
    "id": 7000,
    "name": "bash",
    "task_count": 1,
    "task_stats": {
      "canceled": 0,
      "failed": 0,
      "finished": 1,
      "running": 0,
      "waiting": 0
    }
  },
  "max_fails": null,
  "pin_mode": "None",
  "priority": 0,
  "program": {
    "args": [
      "bash"
    ],
    "cwd": "/home/svatosm",
    "env": {},
    "stderr": "Null",
    "stdout": "Null"
  },
  "resources": {
    "min_time": 0.0,
    "n_nodes": 0,
    "resources": []
  },
  "started_at": "2023-04-17T03:23:59.989407980Z",
  "submit_dir": "/home/svatosm",
  "task_dir": false,
  "tasks": [
    {
      "cwd": "/home/svatosm",
      "finished_at": "2023-04-17T09:46:56.088693387Z",
      "id": 0,
      "started_at": "2023-04-17T03:23:59.994031765Z",
      "state": "finished",
      "stderr": "Null",
      "stdout": "Null",
      "worker": 342
    }
  ],
  "time_limit": null
}
Kobzol commented 1 year ago

Hi, the missing resources are a bug, I'll fix it soon.

Regarding the worker, you can access it in tasks (because each task can have a different worker). But note that this information might not be fully precise - there might be multi-node tasks, which are executed on multiple workers at once, and a task can be executed by multiple workers in succession, if some of them crash while they execute the task.

Regarding the Slurm/PBS ID, I'm not sure how we could add it there. There is no association between HQ jobs and Slurm/PBS allocation IDs (this is by design). What you probably could do is to fetch the information about specific workers and look-up their allocation ID (if there is any). I will add this information to JSON output for worker to make it available.

svatosFZU commented 1 year ago

Thanks. I do not run multi-node calculations, so it would be easy for me to get.