Queing Based on Available CPUs

MelbourneDeveloper commented 4 years ago

An issue with scheduling jobs that often makes it necessary to break them out in to separate processes is the issue of starvation. If too much work is scheduled at the same time one job may not finish, or may not finish in an efficient way.

A useful way to schedule work would be to put it in a queue and only pick off one job for each available CPU. This would maximize the amount of processing power dedicated to the currently running jobs, but minimize the amount of time spent by threads fighting for the CPU.

Of course, the issue with this is that a high priority job may not make it in to the processing pool because it is too backed up, so the pool would have to be aware that some high priority jobs may need to be let through even when there aren't enough CPUs to process the work. Sometimes, it may be necessary to pull low priority jobs out of the pool.

I'd like to change the structure of our app from polling jobs on a timer and instead throw all potential jobs in to the pool and let the scheduler decide which jobs to process next based on priority.

Just a thought...

MelbourneDeveloper commented 4 years ago

Here's a rough sketch. I'm not sure how this would fit with the current architecture of Fluent Scheduler, but something to think about...

using System;
using System.Collections.Generic;
using System.Threading.Tasks;

namespace Scheduling
{
    class Program
    {
        static void Main(string[] args)
        {
            GoAsync().Wait();
        }

        public static Task GoAsync()
        {
            var runningTasks = new List<Task>();

            while (true)
            {
                for (var i = runningTasks.Count - 1; i >= 0; i--)
                {
                    var task = runningTasks[i];

                    if (task.IsCompleted)
                    {
                        Console.WriteLine($"A task was completed and removed {task.Id}");
                        runningTasks.Remove(task);
                    }
                }

                if (runningTasks.Count < Environment.ProcessorCount)
                {
                    var task = CreateTask();
                    Console.WriteLine($"A task was created {task.Id}");
                    runningTasks.Add(task);
                }

                Task.Delay(100);
            }
        }

        public static Task CreateTask()
        {
            return Task.Run(async () =>
            {
                for (var i = 0; i < 100; i++)
                {
                    await Task.Delay(10);
                }
            });
        }
    }
}

ghost commented 4 years ago

This is very true... because FluentScheduler run jobs on the thread pool and if you have many jobs on the pool then some of them might be blocked until others are done. Take into account that implementing such mechanism is very complex, the developer might need to rewrite a system that handles such situation which is not easy. I think this is why he chose the thread pool in order to let the framework/os decide and handle all this complex abstraction (green threading).

tallesl commented 4 years ago

Why reinventing the wheel replacing the task scheduling that the entire .NET community uses on production?

Not to offend, but that snippet looks too naive. It creates some unnecessary constraints and ends up using the task scheduler provided out of the box (it just calls "Task.Run"). For the objective you mention, I imagine you would have to at least use a custom TaskScheduler.

fluentscheduler / FluentScheduler

Queing Based on Available CPUs #267