Closed sunhongmin225 closed 3 years ago
The kernels exist to do this in the Faiss library (e.g., concatenate the partial results together, then k-select the concatenated data), however they are not currently wired together to do this exclusively on the GPUs. This is something that we could do if enough users are interested in it though.
The implementation at the moment is for both CPU and GPU, which does all the execution on the CPU, here:
https://github.com/facebookresearch/faiss/blob/master/faiss/IndexShards.cpp#L45
If IndexShards is used for GPU indices, then the data will be copied to the CPU and merged on the CPU using this function.
Great. Lots of thanks for your super clear explanation.
One more additional question that I've written above, please: say, I'm using 4 GPUs to deal with sift1m dataset. Is it correct that each GPU divide the workload with regards to the size of the dataset? I.e., does the first GPU handle the first 1M/4 = 250K rows of sift1m, the second handles the next 250K rows, ..., and so on? Also if this mechanism is correct, where can I find the code actually dividing workloads to multiple GPUs?
Best, Min.
This is done via IndexShards
and IndexReplicas
, see the doc here:
https://github.com/facebookresearch/faiss/wiki/Faiss-on-the-GPU#using-multiple-gpus
Thanks a lot for your help, @wickedfoo and @mdouze. I sincerely appreciate it.
Best, Min.
Platform
Running on:
Interface:
Summary
Hi, for analyzing GPU searches, I have seen below expression in `GpuIndex::search` in `GpuIndex.cu`. `DeviceScope scope(config_.device);` `auto stream = resources_->getDefaultStream(config_.device);` `auto outDistances = toDeviceTemporary