Marker-Inc-Korea / AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
https://auto-rag.com/
Apache License 2.0
2.8k stars 219 forks source link

[Feature Request] Connect S3 as file loader #829

Open vkehfdl1 opened 1 month ago

vkehfdl1 commented 1 month ago

Is your feature request related to a problem? Please describe. I want to connect Amazon S3 as the file loader to parse to the VectorDB.

Describe the solution you'd like Use Langchain or LlamaIndex (or something better) one to connect many document source to parsing.

Describe alternatives you've considered We can use other library, like liteLLM for getting documents.

vkehfdl1 commented 1 month ago

As alternative, we can build example jupyter notebook.

vkehfdl1 commented 1 month ago

To support AWS well, I think it is better to use fsspec. Unified interface for loading files! We are now only support pdf, so loading pdf files from all kinds of file system.

Below is the full fsspec supported protocol.

It contains dropbox, google drive, S3, even jupyter & github!

['abfs',
 'adl',
 'arrow_hdfs',
 'asynclocal',
 'az',
 'blockcache',
 'box',
 'cached',
 'dask',
 'data',
 'dbfs',
 'dir',
 'dropbox',
 'dvc',
 'file',
 'filecache',
 'ftp',
 'gcs',
 'gdrive',
 'generic',
 'git',
 'github',
 'gs',
 'hdfs',
 'hf',
 'http',
 'https',
 'jlab',
 'jupyter',
 'lakefs',
 'libarchive',
 'local',
 'memory',
 'oci',
 'ocilake',
 'oss',
 'reference',
 'root',
 's3',
 's3a',
 'sftp',
 'simplecache',
 'smb',
 'ssh',
 'tar',
 'wandb',
 'webdav',
 'webhdfs',
 'zip']