dantegd commented 4 years ago

Summary

After issues we ran in cuML's KNN regressor and classifier (that use FAISS underneath), we found out a CUDA 11 specific issue that seems to stem from half precision optimizations to cublasSgemmEx. The solution is very simple and I'll be glad to contribute a PR for it, but wanted to create an issue to document it for future reference (and future CUDA versions).

The issue is: a step in the distance calculations has some numeric discrepancies when compared to older CUDA versions, particularly this step of the distance calculations:

https://github.com/facebookresearch/faiss/blob/c97f8906511dbb97079eaf44329db5aa3216470a/faiss/gpu/impl/Distance.cu#L193

In particular it is easy to see the issue when inspecting resulting distances of bfKnn when run in CUDA11. I've attached a simple script along with data of enough size to notice the discrepancies (1000 rows, 30 columns and using k=10), with the vectors and queries being the same. Here are the first 20 entries of the resulting distances:

As opposed to looking like something we would expect (and is what CUDA 10.1/2 produce, as well as CUDA11 with the fix):

This numeric differences cause a bigger number of discrepancies than what we would expect from the gpu bfknn in many scenarios.This is easily solved by changing CUDA 11's cublasSgemmEx math mode in https://github.com/facebookresearch/faiss/blob/c97f8906511dbb97079eaf44329db5aa3216470a/faiss/gpu/utils/MatrixMult-inl.cuh, with:

# if __CUDACC_VER_MAJOR__ >= 11
  cublasSetMathMode(handle, CUBLAS_MATH_DISALLOW_REDUCED_PRECISION_REDUCTION);
# endif
    return cublasSgemmEx(handle, transa, transb, m, n, k,
                        &fAlpha, A, cAT, lda,
                        B, cBT, ldb,

With that in place, the results are back to expected, and we've checked that we get correct results in our test suite.

Platform

OS: Linux/CUDA 11

Faiss version: 1.6.3 and last commit Faiss compilation options: with cmake (changing gpu arch depending on system I run):

cmake -B build . -DCMAKE_CUDA_ARCHITECTURES="75" -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DBUILD_TESTING=ON

with make:

LIBS=-pthread
CPPFLAGS=-w
LDFLAGS=-L${CMAKE_INSTALL_PREFIX}/lib
      ${CMAKE_CURRENT_BINARY_DIR}/faiss/src/faiss/configure
  --prefix=${CMAKE_CURRENT_BINARY_DIR}/faiss
  --with-blas=${BLAS_LIBRARIES}
  --with-cuda=${CUDA_TOOLKIT_ROOT_DIR}
  --with-cuda-arch=70;75;80
  -v

we usually link openblas for the with-blas, but that should not affect these results since they are CUBLAS based.

Running on:

[ ] CPU
[x] GPU

Interface:

[x] C++
[ ] Python

Reproduction instructions

See attached script. It is a simple C++ script that reads a set of vectors and calls bfKnn, for quick references this are the parameters used:

    args.metric = faiss::MetricType::METRIC_L2;
    args.metricArg = 0;
    args.k = 10;
    args.dims = 30;
    args.numVectors = 1000;
    args.numQueries = 1000;

faiss_repro.zip

wickedfoo commented 4 years ago

Oh wow. I have a machine with CUDA 11 and an A100 now, will look into this on Monday and send out a fix. Thanks for finding this!

wickedfoo commented 4 years ago

https://github.com/facebookresearch/faiss/pull/1388

wickedfoo commented 4 years ago

implemented in #1388

facebookresearch / faiss

CUDA 11 issue with brute force KNN distance calculation #1385

Summary

Platform

Reproduction instructions