NVIDIA / tensorflow

An Open Source Machine Learning Framework for Everyone
https://developer.nvidia.com/deep-learning-frameworks
Apache License 2.0
962 stars 144 forks source link

A core dump occurs when multiple threads call CombinedNonMaxSuppression on GPU #67

Open cyfwry opened 2 years ago

cyfwry commented 2 years ago

System information

Describe the current behavior

A core dump occurs when multiple threads call CombinedNonMaxSuppression on GPU:

Error detected in GPU stream: Error detected in GPU stream: Error detected in GPU stream: an illegal memory access was encounteredan illegal memory access was encounteredan illegal memory access was encountered

2022-08-24 14:56:40.254500: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_event.cc:29] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered 2022-08-24 14:56:40.254538: F external/org_tensorflow/tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1 2022-08-24 14:56:40.254546: F external/org_tensorflow/tensorflow/core/kernels/batched_non_max_suppression_op.cu.cc:825] Non-OK-status: GpuLaunchKernel(SetZero, config.block_count, config.thread_per_block, 0, device.stream(), config.virtual_thread_count, (*output_indices)->flat().data()) status: Internal: an illegal memory access was encountered run_docker_bash.sh: line 108: 47384 Aborted (core dumped)

When one thread calls CombinedNonMaxSuppression on GPU or multiple threads call CombinedNonMaxSuppression on CPU, no error occurs.