Muennighoff / sgpt

SGPT: GPT Sentence Embeddings for Semantic Search
https://arxiv.org/abs/2202.08904
MIT License
851 stars 52 forks source link

Can I use multi GPUS #31

Open magicleo opened 1 year ago

magicleo commented 1 year ago

I have 2 GPUS, each on has 24G memory. when I run code below

model = SentenceTransformerSpecb( "bigscience/sgpt-bloom-7b1-msmarco", cache_folder = "/mnt/storage/agtech/modelCache", ) query_embeddings = model.encode(queries, is_query=True) got OutOfMemoryError, it only use the first GPU. Can it load the model on two gpus?

OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 22.03 GiB total capacity; 21.27 GiB already allocated; 50.94 MiB free; 21.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Muennighoff commented 1 year ago

For inference, you can use accelerate for that I think; Check https://github.com/huggingface/accelerate/issues/769

magicleo commented 1 year ago

@Muennighoff Thank you very much for your reply. I tried code like below model = SentenceTransformerSpecb( "bigscience/sgpt-bloom-7b1-msmarco", cache_folder="/mnt/storage/agtech/modelCache", ) accelerator = Accelerator() model = accelerator.prepare(model)

when run model= accelerator.prepare(model) I got CUDA out of memory,still only use first gpu. Any suggest?