NVIDIA Text Embedding NIM Overview

NeMo Text Retriever NIM (Text Retriever NIM) APIs provide easy access to state-of-the-art models that are foundational building blocks for enterprise semantic search applications, delivering accurate answers quickly at scale. Developers can use these APIs to create robust copilots, chatbots, and AI assistants from start to finish. Text Retriever NIM models are built on the NVIDIA software platform, incorporating CUDA, TensorRT, and Triton to offer out-of-the-box GPU acceleration.

NeMo Retriever Text Embedding NIM- Boosts text question-answering retrieval performance, providing high quality embeddings for many downstream NLP tasks.
NeMo Retriever Text Reranking NIM- Enhances the retrieval performance further with a fine-tuned reranker, finding the most relevant passages to provide as context when querying an LLM. See the Text Reranking NIM documentation for more information.

This diagram shows how Text Retriever NIM APIs can help a question-answering RAG application find the most relevant data in an enterprise setting.

Language Models

NV-EmbedQA-E5-v5: a popular community base embedding model optimized for text question-answering retrieval
NV-EmbedQA-Mistral7B-v2: a popular multilingual community base model fine-tuned for text embedding for high-accuracy question answering
Snowflake’s Arctic-embed-l an optimized community model
NV-RerankQA-Mistral4B-v3, a popular community base model fine-tuned for text reranking for high-accuracy question answering.

YunchaoYang / Blogs

Embedding Models #66

embeddings

references:

NVIDIA Text Embedding NIM Overview