NVIDIA Concurrency Mechanism

SoroushHeidari commented 4 months ago

Thank you for your contribution. I wanted to ask if you have any methods to simulate concurrent execution on GPU. NVIDIA provides three concurrency mechanisms to support concurrent applications: priority streams, time-slicing, and multi-process server (MPS). Is there a way to emulate the concurrency behavior for a simulator? If not, do you any suggestion to approach this problem?

JRPan commented 4 months ago

priority streams: We support multiple streams. You need to enable this by adding -gpgpu_concurrent_kernel_sm 1 to your config file. But your kernels must be small enough. If kernels are just big to fill all SMs then only that kernel will run. You can change the behavior by changing select_kernel.
time-slicing: We don't support this. You would have to model context switching, which writes all register values back to memory.
MPS: We don't support this but this is easy to support. Change the select_kernel function to issue one kernel to only a subset of SMs.

You also probably want to checkout https://github.com/accel-sim/accel-sim-framework/tree/dev-stream-stats. By default, all stats are aggregated which does not make sense if you have concurrency. This branch changed that and stats are collected per-stream. This needs to be paired with this branch of gpgpu-sim. https://github.com/accel-sim/gpgpu-sim_distribution/tree/stream-stats

SoroushHeidari commented 3 months ago

Thank you for getting back to me quickly! I have a question regarding simulating DNN inference and training using Accel-sim. I would like to exclude the first few initial iterations, commonly referred to as "warm-up" iterations, from the final stats report. Is there a way to do this?

accel-sim / accel-sim-framework

NVIDIA Concurrency Mechanism #292