Closed Tachikoma000 closed 6 days ago
This PR implements a PDF loader as part of the document loaders module in Rig. It allows users to easily load and process PDF documents for use in RAG systems and other NLP tasks.
PdfLoader
struct in src/document_loaders/pdf.rs
PdfLoader
to the document_loaders
moduleDocumentLoader
trait for PdfLoader
lopdf
crate for PDF parsingCargo.toml
with the lopdf
dependencyPdfLoader
PdfLoader
usage examplesThe PdfLoader
uses the lopdf
crate to parse PDF files and extract text content. It handles potential errors such as file not found or parsing errors. The extracted text is converted into DocumentEmbeddings
for further processing in Rig.
Unit tests have been added to ensure the PdfLoader
correctly loads PDF files and handles various edge cases. The tests cover:
The main documentation has been updated to include usage examples for the PdfLoader
. This includes how to initialize the loader and integrate it with the EmbeddingsBuilder
.