Open zslzx opened 3 years ago
Would it be possible to simplify the test case? That would make it easier to investigate.
import faiss
import torch
import faiss.contrib.torch_utils
class faiss_KNN:
def __init__(self, d):
res = faiss.StandardGpuResources()
self.index = faiss.GpuIndexFlat(res, d, faiss.METRIC_INNER_PRODUCT)
def knn(self, f1, f2, k):
f1 = f1.t().contiguous()
f2 = f2.t().contiguous()
self.index.reset()
self.index.add(f2)
d_np, I_np = self.index.search(f1, k)
return d_np, I_np
knn_finder = faiss_KNN(256)
torch.manual_seed(123)
f1 = torch.rand((256, 100)).cuda().float()
f2 = torch.rand((256, 100)).cuda().float()
_, index = knn_finder.knn(f1, f2, 1)
index = index[:,0]
original_index = torch.arange(100).cuda()
diff = (index-original_index).abs()
mask = (diff <= 10)
print(mask.size())
print(original_index.size())
#print(mask.sum()) //Commenting out this line will cause error.
original_index = original_index[mask]
The error info of the above code:
Traceback (most recent call last):
File "reshow_simple.py", line 32, in <module>
original_index = original_index[mask]
RuntimeError: invalid shape dimension -50
However, the error disappears after adding a meaningless mask.sum()
before the mase_select operation. Why is there such unpredictable behavior?
I used Anaconda, I installed bellow, it worked. If you use Docker, please install Miniconda.
conda install faiss-gpu cudatoolkit=11.1 -c pytorch-gpu
conda install -c anaconda pytorch-gpu
This issue is stale because it has been open for 7 days with no activity.
Summary
I tried to use faiss in the pytorch model. It can be run correctly with GTX1080Ti and CUDA10. However, when runing on a machine with RTX3090Ti GPU, it behaves abnormally. After performing knn search with faiss, the masked_select operation in pytorch will get wrong results. There is no problem using the two parts separately.
Platform
OS: Ubuntu 20.04.2 LTS GPU: Nvidia RTX3090Ti CUDA: V11.1.105 Pytorch version: 1.9.0
Faiss version: 1.7.0 Installed from: conda install faiss-gpu -c conda-forge
Running on:
Interface:
Reproduction instructions
The code for reproducing the bug:
The batch_index and coords0 tensor are selected with the same mask, while the shapes of the results are not the same. For example, the output of a run is: