Closed crazyn2 closed 7 months ago
Hey @crazyn2
Something probably got lost in translation and I'm not 100% sure what your requested feature would look like.
From what I've gathered, this sounds a lot like the feature requested in https://github.com/Nukesor/pueue/issues/218. This has been implemented, even though it really isn't documented anywhere. Somebody should probably write a Wiki page for this :sweat_smile: .
This feature allows tasks to be called like this: command --gpu $PUEUE_WORKER_ID some_other_parameters
.
If multiple jobs should run per GPU, this would need to be handled by the command
.
Pueue is no high-performance scheduler with complex logic and it's not planned to make it one. There're other tools out there that specificaly tackle complex cluster job management.
In case I completely misunderstood your feature request: Could you write a more detailed explanation, for instance with a proper example and the exact way it would work, including a step-by-step description of the behavior?
I apologize for not expressing myself clearly. I want to have a queue where I can add processes freely and then set a limit on the number of processes running on the cluster, for example, 2. Whenever the number of running processes in any group is less than 2, it will extract one from the queue to that underutilized group. I've read the #218 and wiki. However, It doesn't meet my expectations. This is my bash shell which has the function I expected:
mkfifo mylist
exec 4<>mylist
rm -rf mylist
# 数组锁
mkfifo mylist1
exec 5<> mylist1
# exec 5<> mylist1
rm -rf mylist1
# echo "${cuda_arry[@]}">&$arr_lock
# echo "0 0 0 0">&
# cuda2gpu=(1 2 0)
# gpu2cuda=(2 0 1)
if [ -z "$gpu_pool" ]; then
gpu_pool=($((low_prc_num*2-2)) "$((low_prc_num+1))" "$((low_prc_num+1))")
fi
pool_sum=0
for(( i=0;i<${#gpu_pool[@]};i++)); do
pool_sum=$((pool_sum+${gpu_pool[$i]}))
done;
for ((i=0; i < pool_sum; i++)); do
echo >&4
done
echo "0 0 0">&5
acquire_cuda(){
# set -x
local cuda_arry
read -r -u5 -a cuda_arry
# echo "${cuda_arry[@]}"
for i in "${!cuda_arry[@]}";
do
if [ "${cuda_arry[$i]}" -lt "${gpu_pool[$i]}" ]; then
cuda_arry[i]=$((${cuda_arry[$i]}+1))
export CUDA_VISIBLE_DEVICES=$i
break
fi
done
echo "${cuda_arry[@]}" >&5
# set +x
}
release_cuda(){
local cuda_arry
read -ru5 -a cuda_arry
# echo $CUDA_VISIBLE_DEVICES
index=$CUDA_VISIBLE_DEVICES
cuda_arry[index]=$((${cuda_arry[$index]}-1))
echo "${cuda_arry[@]}" >&5
}
ten_classes(){
for num in {0..9}; do
read -ru4
echo "$num"
{
acquire_cuda "$num"
eval "$cmd"
release_cuda
echo >&4
} &
sleep 1
done
}
I have three GPU cards and I've set the maximum number of processes per card to 2. When a process finishes, it sends a signal to &4. The new process checks the number of running processes on each card and is assigned to the card with fewer than 2 running processes.
I thought a lot about this and this is nothing that'll be added to Pueue. Pueue is not designed to be a complex task scheduler, but rather a small scheduler for server maintainer and hobbyists.
I still think that it's possible to write a wrapper script with the current functionality to map worker ids to a external worker pools of varying size, but this is no logic that'll be added to pueue.
You might want to look at professional cluster management systems. I think our University used slurm for such tasks.
Anyhow, thanks for the detailed feature request :) Have a nice day!
A detailed description of the feature you would like to see added.
A process queue and pueue allocates them from queue to several groups as soon as the group's running processes is lower than parallel number.
Explain your usecase of the requested feature
I have three gpu, but the processes running in them didn't finished at the same time. I want a function that pueue could allocate process in the free gpu or lower usage gpu and I can add process into queue at any time.
Alternatives
No response
Additional context
No response