AcademySoftwareFoundation / OpenCue

A render management system you can deploy for visual effects and animation productions.
https://www.opencue.io
Apache License 2.0
824 stars 198 forks source link

Scheduler Logic redesign #1001

Open DiegoTavares opened 3 years ago

DiegoTavares commented 3 years ago

Opening an issue to start drafting a proposal for the new scheduler logic, as discussed in the last TSC meeting (Jul 21).

Problems with the current design

Proposal

TBD

splhack commented 3 years ago
splhack commented 3 years ago

In our environment, no matter how many Cuebot instances there, the frame launching speed is about 8 frames-per-second. It could have been faster if we can avoid this error by improving the scheduling 🙂

frame reservation error, dispatchProcToJob failed to book next frame, com.imageworks.spcue.dispatcher.FrameReservationException: the frame ... was updated by another thread.

An idea. Introduce a new frame state, SCHEDULING or something like that.

  1. findNextDispatchFrames
    • Update the frame state to SCHEDULING, set the current timestamp to ts_updated where frame.str_state='WAITING', or, passed certain time from frame.ts_updated with frame.str_state='SCHEDULING' to prevent stray frames by Cuebot crash.
    • At the same time, retrieve the updated frames with RETURNING clause.
  2. Schedule the frame!
    • the frame ... was updated by another thread won't happen because findNextDispatchFrames is atomic.
splhack commented 2 years ago

Summarized an experimental optimization and the theory in #1069

To solve the scalability issues in #1012 and #1069, my hunch is that we need some sort of a central scheduler process (or one of Cuebot instance can work like that).

Possible Logic

Maybe leveldb is one of the best solution. Two maps.

  1. Sorted Job list
    • Key: the combination of int_priority + ts_started(maybe ULID), or random number for the current round-robin scheduling
    • Value: Job UUID, group/job/layer CPU/GPU utilization
  2. Job UUID to the key map
    • Key: Job UUID
    • Value: the key of sorted Job list
DiegoTavares commented 2 years ago

I like the idea of a central scheduler process, we're currently evaluating Redis-Stream as an option to handle not only the Job's queue, but also the HostReports. Will update this issue as soon as we have more to share.

oliviascarfone commented 2 years ago

Proposal - High level overview

Central Scheduler Design Logic

Use Redis Stream for incoming HostReports and for Dispatching Jobs. Redis Streams support persistent store and ordered events and also has the ability to store multiple keys/values per event.

This approach will decouple the processing of HostReports from the dispatch of jobs. Redis Streams with consumer groups guarantees that each message is given to a different consumer (same message will not reach multiple consumers within the same group). This addresses the current flaw where Cuebot instances will assign jobs that have already been dispatched to other Cuebot instances. There will be two types of streams. One in which RQD publishes HostReports that are consumed by Cuebot, and the other where Cuebot publishes available jobs and RQDs consume jobs. In this later case, Cuebot will periodically query the database in order to get a list of jobs that are available for processing.

Logic for Host Reports Queue

Logic for Job Queue

thunders82 commented 2 years ago

Hi,

It's nice to see you are looking into the schedule logic redesign. I shared my opinion about the way the GPU nodes are handled in #991 and was kindly pointed to this thread by @splhack to share my opinion.

Currently, if a GPU is not in use by any GPU job it will not accept any CPU job. It is a waste of resource. I was wondering if it would be possible to implement a similar to logic :

Prio to GPU task on GPU nodes :

What do you think ?

Thank you