Marker-Inc-Korea / AutoRAG

AutoML tool for RAG
https://auto-rag.com/
Apache License 2.0
1.66k stars 145 forks source link

[Feature Request] Connect S3 as file loader #829

Open vkehfdl1 opened 1 week ago

vkehfdl1 commented 1 week ago

Is your feature request related to a problem? Please describe. I want to connect Amazon S3 as the file loader to parse to the VectorDB.

Describe the solution you'd like Use Langchain or LlamaIndex (or something better) one to connect many document source to parsing.

Describe alternatives you've considered We can use other library, like liteLLM for getting documents.

vkehfdl1 commented 1 week ago

As alternative, we can build example jupyter notebook.

vkehfdl1 commented 1 week ago

To support AWS well, I think it is better to use fsspec. Unified interface for loading files! We are now only support pdf, so loading pdf files from all kinds of file system.

Below is the full fsspec supported protocol.

It contains dropbox, google drive, S3, even jupyter & github!

['abfs',
 'adl',
 'arrow_hdfs',
 'asynclocal',
 'az',
 'blockcache',
 'box',
 'cached',
 'dask',
 'data',
 'dbfs',
 'dir',
 'dropbox',
 'dvc',
 'file',
 'filecache',
 'ftp',
 'gcs',
 'gdrive',
 'generic',
 'git',
 'github',
 'gs',
 'hdfs',
 'hf',
 'http',
 'https',
 'jlab',
 'jupyter',
 'lakefs',
 'libarchive',
 'local',
 'memory',
 'oci',
 'ocilake',
 'oss',
 'reference',
 'root',
 's3',
 's3a',
 'sftp',
 'simplecache',
 'smb',
 'ssh',
 'tar',
 'wandb',
 'webdav',
 'webhdfs',
 'zip']