facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.
https://faiss.ai
MIT License
29.49k stars 3.49k forks source link

faiss + torch.dist: unable to utilize multiple GPU #3530

Closed ZexinYan closed 6 days ago

ZexinYan commented 3 weeks ago

Summary

Platform

OS: ubuntu 22.04

Faiss version: e758973fa08164728eb9e136631fe6c57d7edf6c

Installed from: miniconda

Faiss compilation options:

Running on:

Interface:

Reproduction instructions

I follow the instructions (Faiss + PyTorch: interoperability for CPU and GPU), but I found that it can't utilize multiple gpu based on torch.dist. I feel like the reason is current implementation will change all cuda tensor to torch.current_device(by default as 0), (torch_utils). Is there any solutions to fix this?

mdouze commented 3 weeks ago

Are you running with one thread per GPU or one process per GPU? I don't know if torch.cuda.current_device() is thread-specific.

ZexinYan commented 3 weeks ago

Are you running with one thread per GPU or one process per GPU? I don't know if torch.cuda.current_device() is thread-specific.

One process per GPU by using torch.distributed.

mdouze commented 2 weeks ago

In that case you can just define the process' GPU as current device no?

ZexinYan commented 2 weeks ago

Did you think torchx.distributed.local_rank()(automatically assigned when using torch.distributed) is better than torch.cuda.current_device()(manually setting)?