Closed DavidMChan closed 1 year ago
@kernfel - What version of CUDA/GCC are you using? Also, are you installing FAISS with the conda installation, or the from-scratch FAISS install?
Cuda toolkit 11.3 GCC -- I may have inadvertently used v10 here... seems my update-alternatives weren't up to date. FAISS -- building from source.
I'm able to reproduce with 500,000 points with CUDA 11.2, gcc 9.3, building both from source. Downgrading to a CPU index does seem to fix the problem, which suggests that the issue is with FAISS gpu index and not with our downstream code.
For anyone at FAISS, the offending code is here:
const int32_t kNumCells = static_cast<int32_t>(
std::sqrt(static_cast<float>(num_points)));
const int32_t kNumCellsToProbe = 20;
// Construct the CPU version of the index
faiss::IndexFlatL2 quantizer(num_dims);
faiss::IndexIVFFlat cpu_index(&quantizer, num_dims, kNumCells, faiss::METRIC_L2);
cpu_index.nprobe = kNumCellsToProbe;
if (num_near_neighbors < 1024)
{
int ngpus = faiss::gpu::getNumDevices();
std::vector<faiss::gpu::GpuResourcesProvider *> res;
std::vector<int> devs;
for (int i = 0; i < ngpus; i++)
{
res.push_back(new faiss::gpu::StandardGpuResources);
devs.push_back(i);
}
// Convert the CPU index to GPU index
faiss::Index *search_index = faiss::gpu::index_cpu_to_gpu_multiple(res, devs, &cpu_index);
search_index->train(num_points, points);
search_index->add(num_points, points);
search_index->search(num_points, points, num_near_neighbors, distances, indices);
delete search_index;
for (int i = 0; i < ngpus; i++)
{
delete res[i];
}
}
else
{
// Construct the index table on the CPU (since the GPU
// can only handle 1023 neighbors)
cpu_index.train(num_points, points);
cpu_index.add(num_points, points);
// Perform the KNN query
cpu_index.search(num_points, points, num_near_neighbors,
distances, indices);
}
The CPU path (if forced, even with a neighbors < 1024) works, while the GPU path doesn't,
Second update: It doesn't seem to be limited to the flat index. The IVFPQ index also seems to have the same error:
Starting TSNE calculation with 500000 points.
Initializing cuda handles... done.
KNN Computation... Faiss assertion 'err__ == cudaSuccess' failed in void faiss::gpu::runTransposeAny(faiss::gpu::Tensor<OtherT, OtherDim, true, int, faiss::gpu::traits::DefaultPtrTraits>&, int, int, faiss::gpu::Tensor<OtherT, OtherDim, true, int, faiss::gpu::traits::DefaultPtrTraits>&, cudaStream_t) [with T = float; int Dim = 3; cudaStream_t = CUstream_st*] at /home/davidchan/Repos/faiss/faiss/gpu/utils/Transpose.cuh:218; details: CUDA error 9 invalid configuration argument
Aborted (core dumped)
Perhaps also related are: https://github.com/facebookresearch/faiss/issues/1835 https://github.com/facebookresearch/faiss/issues/1771
Got my build issues under control and can confirm that FAISS v1.6.5 does not have this issue.
Resolved in latest.
It seems like tsnecuda is experiencing the same issues as in https://github.com/facebookresearch/faiss/issues/1793. Running the code with
./tsne -k 500000
(500000 2D points drawn from a pair of gaussians) gives:Originally posted by @kernfel in https://github.com/CannyLab/tsne-cuda/issues/95#issuecomment-824528732