Closed flyflypeng closed 1 year ago
This is just a small "optimization" to make sure the running kernels are killed.
Consider a naive implementation of the preemption:
Actually, the third step "wait for preemption stream" may take several microseconds. To overlap this operation with other steps, we exchange the third and fourth steps. See: https://github.com/SJTU-IPADS/reef/blob/0a25de5d60edaef524752a921a8c72e131137879/src/reef/server/scheduler.cpp#L335
However, this may lead to another problem: the preemption flag might not have been set to "true" when the running kernel is killed. Therefore, to make sure the all the kernels are either killed or evicted by the preemption flag, we repeat the "ResetCU" call several times here.
Thanks for your detailed explanation! But I still have a question, who checks the preempt_flag of best effort task? the amdgpu driver or HIP runtime?
preempt_flag
is passed as one of the kernel parameters, but I can't find the place to process this flag.
https://github.com/SJTU-IPADS/reef/blob/0a25de5d60edaef524752a921a8c72e131137879/src/reef/executor/hybrid_executor.cpp#L83
The preemption flag is checked in the device code (i.e., GPU kernels).
Thanks a lot!
https://github.com/SJTU-IPADS/reef/blob/0a25de5d60edaef524752a921a8c72e131137879/src/reef/server/scheduler.cpp#L332
I found the
GPUResetCU()
function actually call ioctlkfd_ioctl_wave_reset_args
function to reset all the weavfronts, so why this place need to resetbe_stream_device_queue_cap
times?