Closed aadarsh-ram closed 10 months ago
What exactly do you mean by loading diff file types? Do you want us to work on parsing diff file types to text? Also langchain is kinda focused towards SQL anol. Do you want that level of input availability or do we stick to files?
What exactly do you mean by loading diff file types? Do you want us to work on parsing diff file types to text? Also langchain is kinda focused towards SQL anol. Do you want that level of input availability or do we stick to files?
We'll stick to files (such as Pdf, Doc and TXT) and the backend must be able to parse these different file types.
@aadarsh-ram would smh like this work by extracting file type from filePath:
from langchain.document_loaders import PyPDFLoader
from langchain.document_loaders import Docx2txtLoader
from langchain.document_loaders import TextLoader
if file.endswith(".pdf"):
pdf_path = "./docs/" + file
loader = PyPDFLoader(pdf_path)
documents.extend(loader.load())
elif file.endswith('.docx') or file.endswith('.doc'):
doc_path = "./docs/" + file
loader = Docx2txtLoader(doc_path)
documents.extend(loader.load())
elif file.endswith('.txt'):
text_path = "./docs/" + file
loader = TextLoader(text_path)
documents.extend(loader.load())
Yep, that's what I thought. But, after loading, we need to pass the content to our parser which cleans some data out
L
Currently, only parsing PDFs is supported. Use Langchain (maybe) for loading different file types.