This PR implements a CSV loader as part of the document loaders module in Rig. It allows users to easily load and process CSV documents for use in RAG systems and other document processing tasks.
Changes
Implemented CsvLoader struct in src/document_loaders/csv.rs
Added CsvLoader to the document_loaders module
Implemented DocumentLoader trait for CsvLoader
Used the csv crate for CSV parsing
Added error handling for file operations and CSV parsing
Updated Cargo.toml with the csv dependency
Updated documentation with CsvLoader usage examples
Implementation Details
The CsvLoader uses the csv crate to parse CSV files and extract content. It handles potential errors such as file not found or parsing errors. The extracted content is converted into a single DocumentEmbeddings object for further processing in Rig. Each row of the CSV is formatted as "header: value" pairs, separated by newlines.
Testing
Ran tests to ensure the CsvLoader correctly loads CSV files and handles various edge cases. The tests covered:
Loading a valid CSV file
Handling a non-existent file
Processing a CSV with multiple columns and rows
Dealing with empty CSV files
Handling CSV files with different delimiters
Documentation
Code files are commented, and usage examples have been added to the documentation.
Related Issue
Closes #29
Checklist
[x] Code follows the project's coding style
[x] Tests have been added and all tests pass
[x] Documentation has been updated
[x] Commit messages are clear and descriptive
[x] Changes have been reviewed for potential performance impacts
Additional Notes
This implementation focuses on converting CSV data into a single document for embedding. Future enhancements could include options for creating separate embeddings for each row or handling more complex CSV structures.
This PR implements a CSV loader as part of the document loaders module in Rig. It allows users to easily load and process CSV documents for use in RAG systems and other document processing tasks.
Changes
Implementation Details The CsvLoader uses the csv crate to parse CSV files and extract content. It handles potential errors such as file not found or parsing errors. The extracted content is converted into a single DocumentEmbeddings object for further processing in Rig. Each row of the CSV is formatted as "header: value" pairs, separated by newlines.
Testing Ran tests to ensure the CsvLoader correctly loads CSV files and handles various edge cases. The tests covered:
Documentation Code files are commented, and usage examples have been added to the documentation.
Related Issue Closes #29
Checklist
Additional Notes This implementation focuses on converting CSV data into a single document for embedding. Future enhancements could include options for creating separate embeddings for each row or handling more complex CSV structures.