We tend to saturate the machine and take 100% CPU even for scenarios when 20% CPU would be sufficient.
The reason for the CPU consumption is spinning as a part of spinwaiting or conflict resolution backoffs. The most noticeable case of spinning is the one done in the threadpool as a part of a guarantee that any incoming task will be picked up by a worker.
As worker threads could block, we do not consider that workers already executing tasks will pick the new tasks reliably, thus the threadpool ensures that there is an outstanding thread request after a task is enqueued. In scenarios not requiring 100% CPU, such request is quickly satisfied. On the other hand in a steady state the threadpool is nearly empty (since workers are keeping up with tasks), thus many threads will find no work and leave, only to be invited back again. We will have a few lucky workers working and the rest bouncing between the task queue and the threadpool semaphore. The constant churn of threads between the task queue and the semaphore is at best wasteful. We often find ourselves at 100% utilization even when incoming tasks require much less.
We need something more intelligent here, but this is a nontrivial problem. There is a lot of concerns that would need to be considered – worker blocking, starvation, workitem latency, …
A better detection of blocked threads could be the key to having more options regarding thread wakeups as that could relax the requirements on ensuring a thread wake for every incoming task.
Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.
Issue Details
We tend to saturate the machine and take 100% CPU even for scenarios when 20% CPU would be sufficient.
The reason for the CPU consumption is spinning as a part of spinwaiting or conflict resolution backoffs. The most noticeable case of spinning is the one done in the threadpool as a part of a guarantee that any incoming task will be picked up by a worker.
As worker threads could block, we do not consider that workers already executing tasks will pick the new tasks reliably, thus the threadpool ensures that there is an outstanding thread request after a task is enqueued. In scenarios not requiring 100% CPU, such request is quickly satisfied. On the other hand in a steady state the threadpool is nearly empty (since workers are keeping up with tasks), thus many threads will find no work and leave, only to be invited back again. We will have a few lucky workers working and the rest bouncing between the task queue and the threadpool semaphore. The constant churn of threads between the task queue and the semaphore is at best wasteful. We often find ourselves at 100% utilization even when incoming tasks require much less.
We need something more intelligent here, but this is a nontrivial problem. There is a lot of concerns that would need to be considered – worker blocking, starvation, workitem latency, …
A better detection of blocked threads could be the key to having more options regarding thread wakeups as that could relax the requirements on ensuring a thread wake for every incoming task.
We tend to saturate the machine and take 100% CPU even for scenarios when 20% CPU would be sufficient.
The reason for the CPU consumption is spinning as a part of spinwaiting or conflict resolution backoffs. The most noticeable case of spinning is the one done in the threadpool as a part of a guarantee that any incoming task will be picked up by a worker.
As worker threads could block, we do not consider that workers already executing tasks will pick the new tasks reliably, thus the threadpool ensures that there is an outstanding thread request after a task is enqueued. In scenarios not requiring 100% CPU, such request is quickly satisfied. On the other hand in a steady state the threadpool is nearly empty (since workers are keeping up with tasks), thus many threads will find no work and leave, only to be invited back again. We will have a few lucky workers working and the rest bouncing between the task queue and the threadpool semaphore. The constant churn of threads between the task queue and the semaphore is at best wasteful. We often find ourselves at 100% utilization even when incoming tasks require much less.
We need something more intelligent here, but this is a nontrivial problem. There is a lot of concerns that would need to be considered – worker blocking, starvation, workitem latency, …
A better detection of blocked threads could be the key to having more options regarding thread wakeups as that could relax the requirements on ensuring a thread wake for every incoming task.