After issues we ran in cuML's KNN regressor and classifier (that use FAISS underneath), we found out a CUDA 11 specific issue that seems to stem from half precision optimizations to cublasSgemmEx. The solution is very simple and I'll be glad to contribute a PR for it, but wanted to create an issue to document it for future reference (and future CUDA versions).
The issue is: a step in the distance calculations has some numeric discrepancies when compared to older CUDA versions, particularly this step of the distance calculations:
In particular it is easy to see the issue when inspecting resulting distances of bfKnn when run in CUDA11. I've attached a simple script along with data of enough size to notice the discrepancies (1000 rows, 30 columns and using k=10), with the vectors and queries being the same. Here are the first 20 entries of the resulting distances:
Summary
After issues we ran in cuML's KNN regressor and classifier (that use FAISS underneath), we found out a CUDA 11 specific issue that seems to stem from half precision optimizations to
cublasSgemmEx
. The solution is very simple and I'll be glad to contribute a PR for it, but wanted to create an issue to document it for future reference (and future CUDA versions).The issue is: a step in the distance calculations has some numeric discrepancies when compared to older CUDA versions, particularly this step of the distance calculations:
https://github.com/facebookresearch/faiss/blob/c97f8906511dbb97079eaf44329db5aa3216470a/faiss/gpu/impl/Distance.cu#L193
In particular it is easy to see the issue when inspecting resulting distances of
bfKnn
when run in CUDA11. I've attached a simple script along with data of enough size to notice the discrepancies (1000 rows, 30 columns and using k=10), with the vectors and queries being the same. Here are the first 20 entries of the resulting distances:As opposed to looking like something we would expect (and is what CUDA 10.1/2 produce, as well as CUDA11 with the fix):
This numeric differences cause a bigger number of discrepancies than what we would expect from the gpu bfknn in many scenarios.This is easily solved by changing CUDA 11's
cublasSgemmEx
math mode in https://github.com/facebookresearch/faiss/blob/c97f8906511dbb97079eaf44329db5aa3216470a/faiss/gpu/utils/MatrixMult-inl.cuh, with:With that in place, the results are back to expected, and we've checked that we get correct results in our test suite.
Platform
OS: Linux/CUDA 11
Faiss version: 1.6.3 and last commit Faiss compilation options: with cmake (changing gpu arch depending on system I run):
with make:
we usually link openblas for the
with-blas
, but that should not affect these results since they are CUBLAS based.Running on:
Interface:
Reproduction instructions
See attached script. It is a simple C++ script that reads a set of vectors and calls
bfKnn
, for quick references this are the parameters used:faiss_repro.zip