Open pandyamarut opened 9 months ago
Hi @pandyamarut, Based on your question, my guess is that your application does not use GDRCopy directly. Probably you want to confirm that a library (e.g., UCX, NCCL) is properly utilizing GDRCopy? One way to do so is to export the environment variables below and rerun your application. If GDRCopy is used, you should see some output lines from GDRCopy.
export GDRCOPY_ENABLE_LOGGING=1
export GDRCOPY_LOG_LEVEL=1
@pandyamarut where you able to verify whether your application is utilizing it?
I have successfully installed gdrcopy on my host and completed its tests. Afterwards, I launched a container running my language model application, with a focus on profiling the loading of the model from the local disk. I am looking for methods to confirm whether gdrcopy is active when my application is running. Since I am new to this, I would appreciate any guidance on how to verify the operation of gdrcopy in this context.