NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.77k stars 2.13k forks source link

How to understand cudaStreamCaptureModeGlobal is the only allowed mode in SAFE CUDA #3758

Closed lix19937 closed 6 months ago

lix19937 commented 7 months ago

Ref https://github.com/NVIDIA/TensorRT/blob/release/8.6/samples/common/safeCommon.h#L145
The class TrtCudaGraphSafe is not used in TensorRT OSS project.

The safeCommon is used for automotive safety ?
I doubt that cudaStreamCaptureModeGlobal vs cudaStreamCaptureModeThreadLocal vs cudaStreamCaptureModeRelaxed difference ?
How to understand SAFE CUDA ?
How to understand cudaStreamCaptureModeGlobal is the only allowed mode in SAFE CUDA ?

cudaStreamCaptureModeGlobal: This is the default mode. If the local thread has an ongoing capture sequence that was not initiated with cudaStreamCaptureModeRelaxed at cuStreamBeginCapture, or if any other thread has a concurrent capture sequence initiated with cudaStreamCaptureModeGlobal, this thread is prohibited from potentially unsafe API calls.

cudaStreamCaptureModeThreadLocal: If the local thread has an ongoing capture sequence not initiated with cudaStreamCaptureModeRelaxed, it is prohibited from potentially unsafe API calls. Concurrent capture sequences in other threads are ignored.

cudaStreamCaptureModeRelaxed: The local thread is not prohibited from potentially unsafe API calls. Note that the thread is still prohibited from API calls which necessarily conflict with stream capture, for example, attempting cudaEventQuery on an event that was last recorded inside a capture sequence.

How to understand potentially unsafe API ? potentially unsafe API is equal not-thread-safe API ?

Ref https://docs.nvidia.com/deeplearning/nccl/archives/nccl_21210/user-guide/docs/usage/cudagraph.html#capture-modes

By default, CUDA stream capture uses the cudaStreamCaptureModeGlobal mode if no flag is given to the cudaStreamBeginCapture call. This mode is compatible with NCCL except two scenarios:

If you are using NCCL in multi-thread mode, i.e. a process has multiple threads each of which is attached to a different GPU, then you would need to add the cudaStreamCaptureModeThreadLocal flag to the cudaStreamBeginCapture call.

If you are capturing NCCL P2P calls (ncclSend and ncclRecv) without any previous P2P calls to the same peer(s), you would also need to use the cudaStreamCaptureModeThreadLocal mode.

Thanks.

zerollzeng commented 7 months ago

The safeCommon is used for automotive safety ?

Yes

zerollzeng commented 7 months ago

How to understand SAFE CUDA ?

they are some "standards" that we need to follow for safety requirement, e.g. AUTOSAR

zerollzeng commented 7 months ago

How to understand cudaStreamCaptureModeGlobal is the only allowed mode in SAFE CUDA ?

Some APIs is not allowed for safety, it's a bit of tedious so if you don't work on automotive you can ignore it :-D

lix19937 commented 7 months ago

@zerollzeng Now I will use cudagraph in multi-process in DRIVE AGX Orin X/N, the graph capture mode confuse me.

zerollzeng commented 7 months ago

Do you use QNX?

zerollzeng commented 7 months ago

If not, you can ignore the safe***

lix19937 commented 7 months ago

Do you use QNX?

Current only use linux.

If not, you can ignore the safe***

Thanks.