Nukesor / pueue

:stars: Manage your shell commands.
MIT License
4.9k stars 132 forks source link

Edit task environment variables #503

Open mjpieters opened 7 months ago

mjpieters commented 7 months ago

A detailed description of the feature you would like to see added.

Currently, you can edit three aspects of a scheduled task: the command to be run, the path where the task is run and it's label.

Tasks have another component: the environment variables under which the command is executed. Can we have an option to edit these?

A stretch goal would be the ability to add, set or remove environment variables across all tasks or a task group. This lets you do things like set bandwidth limits on a whole group of tasks where the commands in those tasks honour a bandwidth limit set in an environment variable. Powerful stuff!

Explain your usecase of the requested feature

Loads of UNIX command change their behaviour based on the environment variables that are present when they start. Sometimes, we need to change what environment variables are present for a given task, because sometimes circumstances change or we made a mistake with the env vars when the task was added.

The ability to alter environment variables across all tasks is another powerful option. This may need to be a separate command (or even a separate FR!).

Alternatives

I've actively edited the command for tasks to add extra environment variables. This is a poor workaround however, as it is easy to accidentally alter the command itself.

Additional context

No response

Nukesor commented 7 months ago

It's a bit of an edge-case, but I definitely see how this is a good QoL feature.

I'm thinking about the best way to do this. My first thought would be to set the key-values directly on the commandline, especially if one wants to set multiple values at the same time (e.g. pueue edit 0 -e KEY=value -e OTHER_KEY=othervalue). The behavior will then be a bit inconsistent, but I feel like editing environment variables in an editor will be pretty cumbersome as there're usually a lot of variables.

Maybe we should add a new subcommand for this? This would also make it quite easy to edit variables for groups or multiple commands, as i feel like those flags would be very confusing on the current edit command.

mjpieters commented 7 months ago

Yes, the more I think about this, the more this sounds like a separate subcommand.

Nukesor commented 6 months ago

If you like, feel free to start hacking on this :) Your contributions have always been top notch and I believe that you'll make the right design decisions ;D

activatedgeek commented 6 months ago

Just wanted to chime in with another use case.

I currently schedule jobs on a machine with GPUs using pueue. Most times, I need multiple GPUs and set the CUDA_VISIBLE_DEVICES argument with a list of GPU IDs for a particular run. To get the queueing effect for that run jobs serially on one set of GPUs, I create a group with the default parallel 1 but manually attach the variable with a bash function,

function pugrun {
  GPUS=${GPUS:-0}

  pueue group add gpu-${GPUS}

  CUDA_VISIBLE_DEVICES=${GPUS} pueue add -g gpu-${GPUS} -- "${@}"

  unset GPUS
}

Editing/adding environment variables to groups would be a great addition! Thank you for the amazing project!

Nukesor commented 6 months ago

@activatedgeek It sounds like you could make good use of the https://github.com/Nukesor/pueue/wiki/Get-started#load-balancing feature. This would allow you to do automatic balancing on your gpus for you :)

activatedgeek commented 6 months ago

Thank you.

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs). To avoid going back and change internal code plumbing, it was the easiest to just modify at pueue add time.

But of course, nothing too complicated since it can be similarly plugged in by instead of calling the GPU job directly, calling another bash script that does the transform to call to the GPU job.

Another tricky scenario with the example was this series of steps, say for group cluster1 with parallel 2 (assuming a machine with 2 GPUs with each job using 1 GPU in a round-robin manner):

Queue experiment 1 on cluster 1
Queue experiment 2 on cluster 1
Queue experiment 3 on cluster 1
Scheduled experiment 1 (PUEUE_WORKER_ID = 0 assigned to GPU 0)
Scheduled experiment 2 (PUEUE_WORKER_ID = 1 assigned to GPU 1)
Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 2 assigned to GPU 0)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

Let me know if the scenario makes sense, and if I am missing something simpler here.

Nukesor commented 6 months ago

My understanding was that this behavior would require me to change existing code to parse out the PUEUE_WORKER_ID and transform it to appropriate values of CUDA_VISIBLE_DEVICES (e.g. something like a round robin over the list of GPUs).

Exactly. That's how it was designed to be used, if the Ids weren't directly mappable to the external resources. (Except the round-robin)

Now experiment 1 and experiment 3 both run on GPU 0, and in many of my cases with large models would lead to an out of memory error on GPU 0. Of course, this could be fixed with slightly smarter scheduling at the expense of more plumbing. This is why I took the easier route of limiting each group to schedule on a fix set of GPU IDs.

That's actually not how it would work. In your example, the last line would look like this:

Completed experiment 2
Scheduled experiment 3 (PUEUE_WORKER_ID = 1 assigned to GPU 1)

The ID is not the task id, but rather the internal "worker" or rather "slot" id. A queue with 5 parallel tasks will get 5 slots with ids 0-4. As soon as a task in a queue finishes, its "slot" is freed and the next task then gets that slot.

If it doesn't work this way, this is definitely a bug! That system was specifically designed to handle your usecase. Machines/pools with a set number of limited resources, to which the tasks are then assigned in a greedy manner.

If the current wiki entry doesn't properly convey this behavior, it would be awesome if you could rephrase it! The wiki is free to be edited by anyone :)

activatedgeek commented 6 months ago

Ahaa! That makes much more sense. Thank you for clarifying.

Let me run a quick test simulating this scenario, and will report back. I'll update the wiki with this example so that the behavior is more explicit.

activatedgeek commented 5 months ago

Sorry for the delay. I have verified this behavior, and this actually solves my specific problem.

I also realized I missed this:

There'll never be two concurrent tasks with the same $PUEUE_WORKER_ID in the same group.

In that case, the wiki already explains it precisely, and I will skip the wiki edit.

Nukesor commented 3 months ago

@mjpieters I'm thinking about adding a pueue task command to inspect and eventually manipulate tasks.

The idea came up during https://github.com/Nukesor/pueue/issues/540.

What do you think about a pueue task env $TASK_ID command to inspect the environment. And respective pueue task env unset $TASK_ID $NAME and pueue task env set $TASK_ID $NAME $VALUE commands?

Is the nesting too big or is this reasonable?

Nukesor commented 2 months ago

Ping @mjpieters

mjpieters commented 2 months ago

I love the nesting, actually. Grouping like this can really help with command clarity. E.g. the gh command-line tool for GitHub has 3 levels for most commands.

I would almost say grouping is overdue for pueue; you have commands that operate on either a group or a task, which maybe could do with splitting across command groups, with aliases for backwards compatibility / ease of typing:

and the following first-tier top-level commands:

Note that for many commands that accepted a single -g/--group name the above structure allows for multiple groups to be named. Ditto for pueue task env unset <ENVVARS>... to unset multiple environment variables (but I rather name them explicitly and not default to 'all' if you don't name envvars).

Nukesor commented 2 months ago

I really like the group proposal. Especially, since I intuitively tried to use Pueue like that a few times already :D

I'm not sure about the task subcommand though. I feel conflicted about having the task subcommand space mirror a big part of the top-level subcommand space. It just feels a bit off to have 2 or more different ways of achieving the same thing.

I guess, this would also be confusing, as I would expect different commands to perform different tasks? I know that there're some tools out there that already work this way, but I personally always found those a bit unintuitive/confusing to use.

But permanently moving all those commands (add, remove, etc.) to the task subcommand space would make the CLI less convenient to use. I still assume that most users won't work with groups on a regular basis, so the main interface should stay as convenient as possible.

In general, I really like the idea to restructure the CLI, especially since the next version will be a breaking major version anyway.

I guess I need to think a bit about your proposal and make my mind up about it.

mjpieters commented 2 months ago

Fair enough; it's why I included aliases for the original top-level commands in there (as well as thinking about backwards compatibility). It could perhaps depend on how the aliases are represented when you run pueue --help.