NVlabs / NVBit

199 stars 18 forks source link

Supports for Multi-GPU and GPU-to-GPU communication APIs #34

Closed HamHyungkyu closed 2 years ago

HamHyungkyu commented 3 years ago

Hi NVbit team, I'm trying to experiment for multi-GPU simulation on Accel-sim using NVBit GPU trace. However, I think NvBit does not support Multi GPU trace and GPU-to-GPU Communication APIs. Do you have plan to support them?

ovilla commented 3 years ago

Hi,

Thanks for the interest in NVBit.

The NVBit core supports already multi-GPUs, but some of the example NVBit tools (inside the tools folder) are not multi-GPU aware. The memory reference tracer tool in particular might need some form of locking/thread-safety to support multi-GPU, or it could hang/crash.

In general there is no major limitation in the NVBit core and users can develop very powerful NVBit tools, including tools that work on multi-GPU applications. NVBit tools are written in standard CUDA, the examples provided inside the tools folder can be modified as needed.

We did not implement multi-GPU version of every example tool, since the space to cover would be too large and we would need examples for many tools and for many forms of GPU computing, including CUDA graphs, CUDA+MPI programs, etc....

Having said that, we are thinking of possibly extending the memory reference tracer example on our side (for a next release), but there is nothing special we would be doing inside the NVBit core. It is about modifying the tools/memtrace.cu example (which again is just an example) by adding locking/thread-safety and in general trace serialization so the output does not get clobbered.

I am not sure about the question related to GPU-to-GPU Communication APIs. NVBit is about instrumenting kernels, all the rest is just plain CUDA (for instance the utils/channel.hpp file is rather standard CUDA code in which GPU writes buffers that can be consumed by CPU).

We did not envision tools where GPU-to-GPU communication was required, but if the tool you are envisioning requires that, you can use normal cudaMemcpys from device-to-device or peer-memory accesses inside the NVbit tool you are designing (like in any standard CUDA program).

Again we are considering adding more example tools in the next releases, but we still don't know when. Hope this helps.

--Oreste