example: disabling peer-to-peer between GPUs

On a multi-GPU system, I need to test the same PyTorch app, while NVLink is enabled, and again with NVLink disabled, and compare the results. I do not have access to the hardware. I must disable P2P in software. I cannot modify the PyTorch app.

I believe the cudaDeviceDisablePeerAccess() call might be used for that.

I know NVBit can be used to change the context of a running app, by preloading a library:

LD_PRELOAD=./path/to/some_library_file.so ./path/to/my_app

Browsing the NVIDIA developer forums, I've found the suggestion that NVBit might be used to call cudaDeviceDisablePeerAccess(). All I need is a working example of that - a minimal example, that only disables P2P in that context, and does nothing else.

My goal is to have a simple library, that will always be preloaded before the main app, and either disables P2P or leaves it alone, based on an environment variable, or something like that. The idea is to repeat the same test, with and then without P2P, without changing anything else.

I am not familiar with C for CUDA, and I could not find a working example doing what I've described. I'll appreciate any help you could provide. Thanks!

BTW: I've seen this question on various forums many times - how to disable P2P, or at least NVLink, without having access to the hardware, and without being able to modify the app itself. I think having one good example that solves this problem will benefit future users like me.

NVlabs / NVBit

example: disabling peer-to-peer between GPUs #56