PySlurm / pyslurm

Python Interface to Slurm
https://pyslurm.github.io
GNU General Public License v2.0
467 stars 116 forks source link

Creating a new Jobs object using a list of job ids does not populate statistic fields #340

Closed Overlytic closed 2 months ago

Overlytic commented 2 months ago

Details

Issue

The Jobs.cpus variable is not updating when loading jobs using a list of job ids. Instead of summing all cpus, it only loads the number of jobs. It works correctly when loading using a dictionary though.

dict_running_jobs = {}
running_jobs = []

# Load the first 5 running jobs

for job_id, job_data in list(jobs.items())[:5]:
    if job_data.to_dict()["state"] == "RUNNING":
        dict_running_jobs[job_id] = job_data
        running_jobs.append(job_id)

jobs_running = pyslurm.Jobs(jobs=dict_running_jobs)
print(jobs_running.cpus)

# printed: 40 (correct answer ... 5 jobs, with 8 cpu's each)

jobs_running2 = pyslurm.Jobs(jobs=running_jobs)
print(jobs_running2.cpus)

# printed: 5 (i.e. the number of jobs instead of number of cpus)
tazend commented 2 months ago

Hi,

the reason it works with the dictionary, is because you actually populate the dictionary with the previous job_data that was loaded. When creating and initializing a pyslurm.Jobs object with, for example, just a list of job-ids, like this:

jobs = pyslurm.Jobs(jobs=my_list_of_job_ids)

it will just create initially empty pyslurm.Job objects inside the pyslurm.Jobs collection. Basically, creating a pyslurm.Jobs object does not do anything with the slurm API. To really (re)load the data of the Jobs you just initialized the collection with, you need to call the reload() method - which can be called over and over to update the job data in a existing collection. So, when initializing with a list of job-ids, you need to do this in order to actually fetch the data:

jobs = pyslurm.Jobs(jobs=my_list_of_job_ids).reload()

reload() just returns self, so it can be chained conveniently. I didn't want to have the jobs be loaded implicitly when creating a pyslurm.Jobs collection, so it needs to be explicitly done with the reload() method. (because sometimes, you might just want to sort out existing pyslurm.Job objects into a different pyslurm.Jobs collection, like seperating running and pending jobs for example.)

Overlytic commented 2 months ago

Aah that makes sense! Thanks for clarifying!