I found that the algorithm you used is much different from the implementation of vcuda-controller. Why g_total_cuda_cores is multiplied by 2 here? And why the original implementation of vcuda is removed, are they not suitable in your case? Hope you can share your experience on the algorithm design, thanks :)
I found that the algorithm you used is much different from the implementation of vcuda-controller. Why
g_total_cuda_cores
is multiplied by 2 here? And why the original implementation of vcuda is removed, are they not suitable in your case? Hope you can share your experience on the algorithm design, thanks :)https://github.com/Project-HAMi/HAMi-core/blob/a9ab3b1c6c521b4aabce2dbc8c306e9a67b2a51c/src/multiprocess/multiprocess_utilization_watcher.c#L213
And their
utilization_watcher
ishttps://github.com/tkestack/vcuda-controller/blob/72e0115d5884f22469de857271c002c84c0d0543/src/hijack_call.c#L303