d-krupke / slurminade

A decorator-based slurm runner.
https://slurminade.readthedocs.io
MIT License
12 stars 4 forks source link

JobBundling.get_all_job_ids() doesn't return a list of ids! #24

Open Tony-Makdissy opened 4 months ago

Tony-Makdissy commented 4 months ago

JobBundling.get_all_job_ids() doesn't return a list of ids! It returns a list of slurminade.dispatcher.SlurmJobReference instead. However, iterating over the list of SlrumJobRefrence and applying get_job_id will do the trick!

example:

## Assuming I have a JobBundling object called bundle
print("1st output",bundle.get_all_job_ids())
wanted_output = list(
    map(lambda x: x.get_job_id(), bundle.get_all_job_ids())
)
print("2nd output",wanted_output)

The output will look like this:

1st output [<slurminade.dispatcher.SlurmJobReference object at 0x7fb258638470>, <slurminade.dispatcher.SlurmJobReference object at 0x7fb257f68a40>]
2nd output [40911167, 40911168]

I have three suggested fixes. but I'm not that good in OOP so I'm not super sure what is the best way to fix it!

From the one I like the least to the most:

  1. Add __repr__ method to SlurmJobReference class that should be the id. However, I think this solution isn't the best because the representation should represent more information about the class.
  2. Edit flush method in JobBundling class by applying get_job_id on all items of job_ids before extending _all_job_ids. i.e. add job_ids = map(lambda x: x.get_job_id(), job_ids) just after the for loop.
  3. Make the function get_dispatcher calls SlurmDispatcher or DirectCallDispatcher and then return the id instead of the chosen class, but then you have to define a "Direct_id"! I'm still learning how to use your library so I don't know if a "Direct_id" makes any sense!

Since I don't know the full scope of your code I didn't try to implement any of the changes (sorry for being lazy!) but I hope these suggestions might help.

As a side suggestion, I think __repr__ method should replace the get_info method for all the JobRefrence classes (LocalJobReference, TestJobReference, SubprocessJobReference, and SlurmJobReference). Or use both, to still be able to access the needed information as a dictionary.

In the end, let me thank you for this amazing library, I think it will save me a lot of hassle that I've been trying to solve for more than a year now.

d-krupke commented 4 months ago

Thanks for noticing this and giving me suggestions, Tony! I will look into it this week :)

d-krupke commented 3 months ago

Finally have some time. I remember having changed from just having the job ids to using these reference objects because they theoretically allow more flexibility, i.e., having the dispatcher override certain behaviors or add additional features if supported. For example, if one uses the local version, there will be pids instead of slurm job ids, which would have to be handled differently for cancelling a job.

I have to double check some parts of the code to decide for a solution.

d-krupke commented 3 months ago

I noticed that I indeed did not migrate properly to the job references and that there are some mix ups with ids vs references in the code. In general, we want to use the reference objects because they are more flexible and can be used to obtain the job id, whereas this is not possible the other way around. However, if a function claims to return a job id, it should return an id. Not happy with this, yet, but this is all I have the time for right now. Maybe I put some students on it to fix some of the current shortcomings.