Open DiegoTavares opened 3 years ago
FIFO scheduling
GPU scheduling logic
One frame per machine
how can I achieve that one machine get only one frame and that frame render use all cores on that machine, regardless how many cores the given machine have
Limit per machine
In our environment, no matter how many Cuebot instances there, the frame launching speed is about 8 frames-per-second. It could have been faster if we can avoid this error by improving the scheduling 🙂
frame reservation error, dispatchProcToJob failed to book next frame, com.imageworks.spcue.dispatcher.FrameReservationException: the frame ... was updated by another thread.
An idea. Introduce a new frame state, SCHEDULING
or something like that.
findNextDispatchFrames
SCHEDULING
, set the current timestamp to ts_updated
where frame.str_state='WAITING'
, or, passed certain time from frame.ts_updated
with frame.str_state='SCHEDULING'
to prevent stray frames by Cuebot crash.RETURNING
clause.the frame ... was updated by another thread
won't happen because findNextDispatchFrames
is atomic.Summarized an experimental optimization and the theory in #1069
To solve the scalability issues in #1012 and #1069, my hunch is that we need some sort of a central scheduler process (or one of Cuebot instance can work like that).
Maybe leveldb is one of the best solution. Two maps.
I like the idea of a central scheduler process, we're currently evaluating Redis-Stream as an option to handle not only the Job's queue, but also the HostReports. Will update this issue as soon as we have more to share.
Central Scheduler Design Logic
Use Redis Stream for incoming HostReports and for Dispatching Jobs. Redis Streams support persistent store and ordered events and also has the ability to store multiple keys/values per event.
This approach will decouple the processing of HostReports from the dispatch of jobs. Redis Streams with consumer groups guarantees that each message is given to a different consumer (same message will not reach multiple consumers within the same group). This addresses the current flaw where Cuebot instances will assign jobs that have already been dispatched to other Cuebot instances. There will be two types of streams. One in which RQD publishes HostReports that are consumed by Cuebot, and the other where Cuebot publishes available jobs and RQDs consume jobs. In this later case, Cuebot will periodically query the database in order to get a list of jobs that are available for processing.
RQD acts as producer and will send HostReports to a dedicated Redis Stream for HostReports
All Cuebot instances are added to the same Consumer Group and are listening for incoming messages.
HostReports are then stored in the database
In RQD: create connection to Redis server in RqCore module
In Cuebot: create class RedisConsumer, which can be initialized at application start up and connect to the Redis server, and will await incoming messages
Cuebot acts as the producer. Using a service like zookeeper to elect an instance as the leader, the database will be polled on an interval by the leader for available jobs and published as a priority queue to the Redis Stream dedicated to pending jobs
Limit the amount of Cuebot instances accessing the database directly to one (elected leader instance will access db)
RQD as consumer gets the highest priority job and determines if this job can be run on that host. RQD will send acknowledgement of job if it can be run, otherwise job remains in a pending state for other hosts to check.
Hi,
It's nice to see you are looking into the schedule logic redesign. I shared my opinion about the way the GPU nodes are handled in #991 and was kindly pointed to this thread by @splhack to share my opinion.
Currently, if a GPU is not in use by any GPU job it will not accept any CPU job. It is a waste of resource. I was wondering if it would be possible to implement a similar to logic :
Prio to GPU task on GPU nodes :
What do you think ?
Thank you
Opening an issue to start drafting a proposal for the new scheduler logic, as discussed in the last TSC meeting (Jul 21).
Problems with the current design
Proposal
TBD