0xPlaygrounds / rig

A library for developing LLM-powered Rust applications.
https://rig.rs
MIT License
81 stars 3 forks source link

feat: Add PDF loader to document loaders #25

Closed Tachikoma000 closed 6 days ago

Tachikoma000 commented 6 days ago
Tachikoma000 commented 6 days ago

Add PDF Loader to Document Loaders

This PR implements a PDF loader as part of the document loaders module in Rig. It allows users to easily load and process PDF documents for use in RAG systems and other NLP tasks.

Changes

Implementation Details

The PdfLoader uses the lopdf crate to parse PDF files and extract text content. It handles potential errors such as file not found or parsing errors. The extracted text is converted into DocumentEmbeddings for further processing in Rig.

Testing

Unit tests have been added to ensure the PdfLoader correctly loads PDF files and handles various edge cases. The tests cover:

Documentation

The main documentation has been updated to include usage examples for the PdfLoader. This includes how to initialize the loader and integrate it with the EmbeddingsBuilder.

Related Issue