NVIDIA / nv-ingest

NVIDIA Ingest is a set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
Apache License 2.0
41 stars 12 forks source link

[DOC]: Add "blueprint" diagram and explain #25

Open randerzander opened 3 weeks ago

randerzander commented 3 weeks ago

How would you describe the priority of this documentation request

Significant improvement

Please provide a link or source to the relevant docs

README.md

Describe the problems in the documentation

The README text does not make it clear how parts of the architecture diagram fit together and how NVIDIA NIMs are used. We recommend that the diagram also be explained for clarity.

Diagram:

Image

(Optional) Propose a correction or improvement

No response

abeltre1 commented 5 days ago

@randerzander further, we should add descriptions as the following to make it easier to understand and digest how the architecture is coming together:

PDF Ingestion NIM microservices

  1. nv-yolox-structured-image: A fine-tuned object detection model to detect charts, plots, and tables in PDFs.
  2. Deplot: A popular community pix2struct model for generating descriptions of charts.
  3. CACHED: An object detection model used to identify various elements in graphs.
  4. PaddleOCR: An optical character recognition (OCR) model to transcribe text from tables and charts.
  5. NVIDIA NeMo Retriever NIM microservices
  6. nv-embedqa-e5-v5: A popular community base-embedding model optimized for text question-answering retrieval.
  7. nv-rerankqa-mistral4b-v3: A popular community base model fine-tuned for text reranking for high-accuracy question answering.
  8. For more information, see An Easy Introduction to Multimodal Retrieval-Augmented Generation.