Project-HAMi / HAMi

Heterogeneous AI Computing Virtualization Middleware
http://project-hami.io/
Apache License 2.0
783 stars 168 forks source link

Core resource allocation using time-slice #397

Open ltson4121994 opened 2 months ago

ltson4121994 commented 2 months ago

Hi, I came across this repos from tkestack/gpu-manager. I saw that in gpu-manager, vcuda was used for core resource sharing, I wonder why it was changed to time-slicing. In my opinion, the ability to execute parallel would be more optimized compared to time slicing. Could you please share more about this design decision? Thank you very much.

archlitchi commented 2 months ago

If you want to use core resource sharing, you need to put all different tasks into the same context, such technics is called MPS, but there are many issues on MPS, so we can't apply that in production environment.

ltson4121994 commented 2 months ago

Hi, thanks for your prompt reply. I think this project is still using hami core right? I take a look at the source code and it seems like the sharing method is by allocating cores for each kernel launch not the nvidia time slicing method right? Have you ever try benchmarking the performance?

archlitchi commented 2 months ago

yes, hami-core is not using nvidia time slicing method, it implements its own time slicing strategy by blocking kernel submission of certain tasks periodically. the performance overhead introduced is less than 5%

ltson4121994 commented 2 months ago

I mean the performance comparation between nvidia time slicing and the custom time slicing when there are multiple containers launch kernel at the same time. Theoretically, I think the custom method should be better since it allow parallel kernel launching while nvidia time slicing does not?

archlitchi commented 2 months ago

both nvidia and custom don't support parallel kernel launching

ltson4121994 commented 2 months ago

So is there any advantages of using the custom over the nvidia time slicing?

archlitchi commented 2 months ago

em... i'm not sure if we can use nvidia time slicing in HAMi-core, we can discuss that if you have a plan

ltson4121994 commented 2 months ago

I think for nvidia time slicing we only need to config the node but it is not as flexible since we have to specify the number of replicas and whenever this config is changed, gpu-operator has to be restarted so I guess it is not production ready as well. I am just curious about the performance comparison in practice. But if both cannot execute kernel launch in parallel, then probably nvidia time slicing will have better performance since all containers have full access to the resources. In that case, we are somewhat trading performance for better governance right?

archlitchi commented 2 months ago

i think i get what nvidia time-slicing means, it simply throw that task into GPU, and let them compete with each other. It doesn't guarantee how many compute power it can use. it equals to submit all vGPU tasks without specify nvidia.com/gpucores.

ltson4121994 commented 2 months ago

I see, but I still haven't figured why we should limit the core utilization when we are not able to run in parallel anyway. It's either we let them run in parallel and enforce core utilization quotas or we let them run sequentially with the entire core resources right? In that case we should enforce the quotas by execution time spent for each process I suppose, not sure if it is possible though.

lixd commented 2 days ago

Time-slice scheduling is the logic of the GPU. If a process is a compute-intensive task, it may submit GPU kernels frequently, which may occupy more time slices in the scheduling, so other process avaliable gpu time is reduce. In HAMi we limit all process's frequently of submit GPU kernels(limit by pod's gpucores resource request) to ensure all process can got enough gpu time.