Data Ingestion Module - Githubissues

This PR is for initial work on the data ingestion module in the RAG pipeline. The current version supports reading from a directory containing multiple PDFs. The text from the documents is read and chunked up accordingly based on the provided chunk_size. Additionally, multi-threading is supported to process multiple files parallely.

This PR also sets up the usage of the AutoGluon RAG package with the agrag command. Further details are outlined in the README.

I have run black and isort on the codebase.

All unittests pass for the data ingestion module.

autogluon / autogluon-rag

Data Ingestion Module #1