Open JinwoongKim opened 8 years ago
We often fail to maximize the utilization of the CPU because less CPU threads than the number of threads shows the peak performance.
For example, even if we have 16 threads, 4 or 8 threads show better performance than 16 threads due to the ... corresponding CUDA block threads...?
GPU kernel looks through the shared queue to process the query
Not only CPU thread, CPU-GPU pair ... should bring a query from queue....
We often fail to maximize the utilization of the CPU because less CPU threads than the number of threads shows the peak performance.
For example, even if we have 16 threads, 4 or 8 threads show better performance than 16 threads due to the ... corresponding CUDA block threads...?
GPU kernel looks through the shared queue to process the query