NVIDIA / cuCollections

Apache License 2.0
491 stars 89 forks source link

[BUG]: Compute sanitizer memcheck failed with `DISTINCT_COUNT_ESTIMATOR_TEST` #538

Closed PointKernel closed 4 months ago

PointKernel commented 4 months ago

Is this a duplicate?

Type of Bug

Something else

Describe the bug

========= COMPUTE-SANITIZER
Randomness seeded to: 2057510101
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaFuncSetAttribute.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x480996]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaFuncSetAttribute [0x54d01]
=========                in /usr/local/cuda-12.3/lib64/libcudart.so.12
=========     Host Frame:void cuco::detail::hyperloglog_ref<int, (cuda::std::__4::thread_scope)1, cuco::detail::XXHash_64<int> >::add_async<thrust::detail::normal_iterator<thrust::device_ptr<int> > >(thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, cuda::__4::stream_ref) in /home/yunsongw/Work/cuCollections/include/cuco/detail/hyperloglog/hyperloglog_ref.cuh:212 [0x17a49]
=========                in /home/yunsongw/Work/cuCollections/build/tests/./DISTINCT_COUNT_ESTIMATOR_TEST
=========     Host Frame:void CATCH2_INTERNAL_TEMPLATE_TEST_0<int, cuco::detail::XXHash_64<int> >() in /home/yunsongw/Work/cuCollections/tests/distinct_count_estimator/unique_sequence_test.cu:65 [0x1c9ee]
=========                in /home/yunsongw/Work/cuCollections/build/tests/./DISTINCT_COUNT_ESTIMATOR_TEST
=========     Host Frame:Catch::RunContext::invokeActiveTestCase() in src/catch2/internal/catch_run_context.cpp:536 [0xa164e]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) in src/catch2/internal/catch_run_context.cpp:498 [0xa20ed]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::RunContext::runTest(Catch::TestCaseHandle const&) in src/catch2/internal/catch_run_context.cpp:236 [0xa2ee2]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::Session::runInternal() in src/catch2/catch_session.cpp:332 [0x7ac0c]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::Session::run() in src/catch2/catch_session.cpp:263 [0x7af20]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:main in src/catch2/internal/catch_main.cpp:36 [0x11e0]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2Main.so.3.3.0
=========     Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:58 [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main in ../csu/libc-start.c:379 [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0xbee5]
=========                in /home/yunsongw/Work/cuCollections/build/tests/./DISTINCT_COUNT_ESTIMATOR_TEST
========= 
========= Program hit cudaErrorInvalidValue (error 1) due to "invalid argument" on CUDA API call to cudaGetLastError.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame: [0x480996]
=========                in /lib/x86_64-linux-gnu/libcuda.so.1
=========     Host Frame:cudaGetLastError [0x4cdc7]
=========                in /usr/local/cuda-12.3/lib64/libcudart.so.12
=========     Host Frame:void cuco::detail::hyperloglog_ref<int, (cuda::std::__4::thread_scope)1, cuco::detail::XXHash_64<int> >::add_async<thrust::detail::normal_iterator<thrust::device_ptr<int> > >(thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, cuda::__4::stream_ref) in /home/yunsongw/Work/cuCollections/include/cuco/detail/hyperloglog/hyperloglog_ref.cuh:212 [0x17a51]
=========                in /home/yunsongw/Work/cuCollections/build/tests/./DISTINCT_COUNT_ESTIMATOR_TEST
=========     Host Frame:void CATCH2_INTERNAL_TEMPLATE_TEST_0<int, cuco::detail::XXHash_64<int> >() in /home/yunsongw/Work/cuCollections/tests/distinct_count_estimator/unique_sequence_test.cu:65 [0x1c9ee]
=========                in /home/yunsongw/Work/cuCollections/build/tests/./DISTINCT_COUNT_ESTIMATOR_TEST
=========     Host Frame:Catch::RunContext::invokeActiveTestCase() in src/catch2/internal/catch_run_context.cpp:536 [0xa164e]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) in src/catch2/internal/catch_run_context.cpp:498 [0xa20ed]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::RunContext::runTest(Catch::TestCaseHandle const&) in src/catch2/internal/catch_run_context.cpp:236 [0xa2ee2]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::Session::runInternal() in src/catch2/catch_session.cpp:332 [0x7ac0c]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:Catch::Session::run() in src/catch2/catch_session.cpp:263 [0x7af20]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2.so.3.3.0
=========     Host Frame:main in src/catch2/internal/catch_main.cpp:36 [0x11e0]
=========                in /home/yunsongw/Work/cuCollections/build/_deps/catch2-build/src/libCatch2Main.so.3.3.0
=========     Host Frame:__libc_start_call_main in ../sysdeps/nptl/libc_start_call_main.h:58 [0x29d90]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:__libc_start_main in ../csu/libc-start.c:379 [0x29e40]
=========                in /lib/x86_64-linux-gnu/libc.so.6
=========     Host Frame:_start [0xbee5]
=========                in /home/yunsongw/Work/cuCollections/build/tests/./DISTINCT_COUNT_ESTIMATOR_TEST

How to Reproduce

compute-sanitizer --tool memcheck ./DISTINCT_COUNT_ESTIMATOR_TEST

Expected behavior

memcheck should pass with no failures

Operating System

Ubuntu 22.04

nvidia-smi output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro RTX 8000                Off | 00000000:17:00.0 Off |                  Off |
| 34%   35C    P8              10W / 260W |      6MiB / 49152MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Quadro P620                    Off | 00000000:B3:00.0  On |                  N/A |
| 37%   51C    P0              N/A /  N/A |    851MiB /  2048MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      5071      G   /usr/lib/xorg/Xorg                            4MiB |
|    1   N/A  N/A      5071      G   /usr/lib/xorg/Xorg                          288MiB |
|    1   N/A  N/A      5280      G   /usr/bin/gnome-shell                        205MiB |
|    1   N/A  N/A      8468      G   ...seed-version=20240711-180158.427000      206MiB |
|    1   N/A  N/A     29374      G   ...yOnDemand --variations-seed-version      145MiB |
|    1   N/A  N/A     63537      G   /opt/cisco/anyconnect/bin/acwebhelper         0MiB |
+---------------------------------------------------------------------------------------+

NVCC version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0
sleeepyjack commented 4 months ago

I think this is caused by the try_reserve_shmem function which uses the return value of cudaFuncSetAttribute programmatically. compute-sanitizer will flag any CUDA API call that is not successful as an error.

I think cudf ran into a similar issue some time ago (see rapidsai/cudf#15753). Their solution was to just skip the test.

Edit: Nevermind, I found a better solution to the problem (see #540)