SGCI / sgci-resource-inventory

This contains all the computational resource entities
https://sgci-resource-inventory.readthedocs.io/en/latest/introduction.html
Apache License 2.0
6 stars 2 forks source link

Info about MOM/Service nodes #26

Open yadudoc opened 2 years ago

yadudoc commented 2 years ago

A resource inventory with live data(esp on outages) is valuable for projects that I work on (Parsl and funcX). The schema definition is already in great shape and covers most of the info that we'd care to know about a new site. I really appreciate the effort that's gone in here and I'm hopeful that the inventory of sites will grow.

Our projects use a pilot-job based model for acquiring resources from batch systems, and one potential issue that we run into is not knowing whether the job script lands on a MOM/Service node as is common on cray systems. If the job lands on a shared service node, we need to be more careful about memory usage to be better citizens.

I was wondering whether info on the presence of service nodes could be added to the batchSystemDefinition, say something like:

"batchSystemDefinition": {
      "type": "object",
      "required": [
        "jobManager"
      ],
      "properties": {
        "ServiceNode": {
          "description":"Specifies whether service nodes are absent or present",
          "type": "string",
          "enum": [
            "Present",
            "Absent",
          ]
        },
spamidig commented 2 years ago

@yadudoc Is this specific to Cray and Pilot jobs.. or more generic for pilot jobs as I have not seen this for regular jobs in Cray either in CCM or Cray modes. Could you please elaborate how designating availability of service nodes would be used.