Is it a desired behaviour that _encode always returns embeddings on cpu, even though we pass device argument to it?

McGill-NLP / llm2vec

Code for 'LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders'

https://mcgill-nlp.github.io/llm2vec/

MIT License

816 stars 59 forks source link

Is it a desired behaviour that _encode always returns embeddings on cpu, even though we pass device argument to it? #100

Closed VProv closed 1 week ago

VProv commented 2 weeks ago

This behaviour is a bit confusing, maybe at least add a description

vaibhavad commented 2 weeks ago

Hi @VProv,

Thanks for contributing via #101. Yes, this is the desired behaviour to avoid OOM problems on GPU with large datasets. If you want, you can raise another PR that gives an additional argument to retain the embeddings on GPU. It will be also be helpful to include a warning in such a case as GPU memory will constantly keep on increasing as more documents are encoded.

vaibhavad commented 1 week ago

Closing as it is stale. Feel free to re-open if you have any more questions.