OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux release 7.6.1810 (Core)
TensorFlow installed from (source or binary): source
TensorFlow version (use command below): nv20.12
Python version: 2.7
Bazel version (if compiling from source): 0.24.1
GCC/Compiler version (if compiling from source): GCC 8.3.1
CUDA/cuDNN version: CUDA 11.4.2, cuDNN 8.2.4.15
GPU model and memory: A30
Describe the current behavior
A core dump occurs when multiple threads call CombinedNonMaxSuppression on GPU:
Error detected in GPU stream: Error detected in GPU stream: Error detected in GPU stream: an illegal memory access was encounteredan illegal memory access was encounteredan illegal memory access was encountered
2022-08-24 14:56:40.254500: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
2022-08-24 14:56:40.254538: F external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1
2022-08-24 14:56:40.254546: F external/org_tensorflow/tensorflow/core/kernels/batched_non_max_suppression_op.cu.cc:825] Non-OK-status: GpuLaunchKernel(SetZero, config.block_count, config.thread_per_block, 0, device.stream(), config.virtual_thread_count, (*output_indices)->flat().data()) status: Internal: an illegal memory access was encountered
run_docker_bash.sh: line 108: 47384 Aborted (core dumped)
When one thread calls CombinedNonMaxSuppression on GPU or multiple threads call CombinedNonMaxSuppression on CPU, no error occurs.
System information
Describe the current behavior
A core dump occurs when multiple threads call CombinedNonMaxSuppression on GPU:
Error detected in GPU stream: Error detected in GPU stream: Error detected in GPU stream: an illegal memory access was encounteredan illegal memory access was encounteredan illegal memory access was encountered
2022-08-24 14:56:40.254500: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2022-08-24 14:56:40.254538: F external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1 2022-08-24 14:56:40.254546: F external/org_tensorflow/tensorflow/core/kernels/batched_non_max_suppression_op.cu.cc:825] Non-OK-status: GpuLaunchKernel(SetZero, config.block_count, config.thread_per_block, 0, device.stream(), config.virtual_thread_count, (*output_indices)->flat().data()) status: Internal: an illegal memory access was encountered
run_docker_bash.sh: line 108: 47384 Aborted (core dumped)
When one thread calls CombinedNonMaxSuppression on GPU or multiple threads call CombinedNonMaxSuppression on CPU, no error occurs.