GreenScheduler / cats

CATS: the Climate-Aware Task Scheduler :cat2: :tiger2: :leopard:
https://greenscheduler.github.io/cats/
MIT License
47 stars 7 forks source link

Wrap schedulers, starting with at(1) #53

Closed andreww closed 4 months ago

andreww commented 11 months ago

In the current design we use cats to generate some output (on standard out) that can be used as an argument to at to set the runtime. All other output goes to standard error. This makes handling out output a bit complicated and cats does not really look like a stand alone scheduler.

One option would be to rebuild the command line interface to cats such that it looks like a scheduler itself and under the hood calls at having done the calculation for the start time. This would probably use processing from the standard library. It could also open the door to letting us ship more than one command line programme.

Anyhow, this came up in #52 and it seems like it needs thinking about.

sadielbartholomew commented 11 months ago

To chip in with my thoughts...

I feel strongly that we shouldn't bind ourselves to at as a scheduler or similar end tool to pass cats output optimal time to, since we can have a more flexible and useful tool if we enable it to interface easily with other options that people use in their workflows and might want to incorporate green scheduling into, from simple commands such as at and cron via batch scheduling like slurm to full-on workflow engines e.g. Airflow, snakemake, etc.

and cats does not really look like a stand alone scheduler.

That said, I agree with the above statement holding true for the current design, and with the name 'Climate-Aware Task Scheduler' it would be confusing not to have scheduling as the default outcome.

With those points in mind, in my opinion the best thing to do would be to have configurability over the main outcome, namely so that running cats as a command will either call and run a given choice of the supported schedulers e.g. at, and hopefully in future we can support others too, or simply return the optimal time output as we presently do (perhaps with a choice of options for the datetime format of that output). (The carbon savings output could be another output that can be specified to return, though we might want to have the provision of that configurable via a separate CLI option and treated as subsidiary information, as we do at present). And running via at could be the default choice, whereas any other option would require a CLI option to specify that this is what the user intended.

We should definitely provide means to run cats to just give the optimal time and not schedule anything via the cats command itself, so people can use cats output for whatever they wish and not just with at. Including schedulers and similar end tools that we haven't coded for to interface with directly by running the cats command with relevant options.

What do people think about this idea? If we went ahead with something along these lines, we'd need to, as a next step, agree on the command-line interface arguments and options that would be most clear and useful to implement such a set of choices for output.

abhidg commented 11 months ago

Perhaps the default could be a at invocation as @sadielbartholomew and @andreww suggested, but with a --json or similar optional parameter that will give a JSON output, which we can specify a schema for as well. JSON is not easy to read, but is perfect for piping to other tools (via something like jq). There could be another option for human-readable output, which can also be shown in the default invocation.

andreww commented 11 months ago

Love the idea of making other schedulers configurable. And yes - an option to generate json encoded information could be very useful for interoperability. I also imagine an API in python-land would be useful for some. Would we want a single command line programme to do everything or more simpler programmes?

colinsauze commented 11 months ago

Something I've been wondering, what other schedulers can directly use our current output format without any additional parsing? If we wanted to use Cats inside a cron job as it currently stands, I think I'd probably create a cron script to start at the earliest possible time to run the job and then have that invoke at to delay the start based on the Cats output.

andreww commented 11 months ago

So we don't loose the idea, @abhidg suggested:

A simple way to “schedule” could be by using the —begin flag in sbatch: https://slurm.schedmd.com/sbatch.html#OPT_begin This of course doesn’t guarantee that the job will start then, as the maximum number of jobs might be running already. On a fairly empty queue, this might be a good approximation.

From a quick look at that link, I think we have all the information we need to enable this using the same approach as we use for at at the moment, but we probably need a different time output format. If I'm correct, we would need a new command line argument for this too.