facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
30.38k stars 3.55k forks source link

Strange behavior while using RTX3090 #1850

Closed kehuantiantang closed 1 month ago

kehuantiantang commented 3 years ago

Summary

In the same machine, I have two gpus, one is TianXP, another is RTX 3090, the TianXP works fine, but RTX3090, give the error of

Faiss assertion 'err == CUBLAS_STATUS_SUCCESS' failed in void faiss::gpu::runMatrixMult(faiss::gpu::Tensor<float, 2, true>&, bool, faiss::gpu::Tensor<T, 2, true>&, bool, faiss::gpu::Tensor<IndexType, 2, true>&, bool, float, float, cublasHandle_t, cudaStream_t) [with AT = float; BT = float; cublasHandle_t = cublasContext*; cudaStream_t = CUstream_st*] at /__w/faiss-wheels/faiss-wheels/faiss/faiss/gpu/utils/MatrixMult-inl.cuh:162; details: cublas failed (13): (512, 256) x (5, 256)' = (512, 5)

Running on:

Interface:

Reproduction instructions

This is my code, I use same code to test it, but only change the GPU id by os.environ['CUDA_VISIBLE_DEVICES'] = '0', RTX 3090 give me error as I mentioned, but TianXP works fun.


def tp(x, nmb_clusters):
    device = x.device
    x = common_functions.to_numpy(x).astype(np.float32)
    n_data, d = x.shape

    # faiss implementation of k-means
    clus = faiss.Clustering(d, nmb_clusters)
    clus.niter = 20
    clus.max_points_per_centroid = 10000000
    index = faiss.IndexFlatL2(d)
    if faiss.get_num_gpus() > 0:
        index = faiss.index_cpu_to_all_gpus(index)
    # perform the training
    clus.train(x, index)
    _, idxs = index.search(x, 1)

    return torch.tensor([int(n[0]) for n in idxs], dtype=int, device=device)

if __name__ == '__main__':
    a = torch.rand(1000, 256, dtype=torch.float32)
    tp(a, 5)
csldali commented 2 years ago

I got the same error with 3090

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 1 month ago

This issue was closed because it has been inactive for 7 days since being marked as stale.