[Feature Request] Connect S3 as file loader

vkehfdl1 commented 1 month ago

Is your feature request related to a problem? Please describe. I want to connect Amazon S3 as the file loader to parse to the VectorDB.

Describe the solution you'd like Use Langchain or LlamaIndex (or something better) one to connect many document source to parsing.

Describe alternatives you've considered We can use other library, like liteLLM for getting documents.

vkehfdl1 commented 1 month ago

As alternative, we can build example jupyter notebook.

vkehfdl1 commented 1 month ago

To support AWS well, I think it is better to use fsspec. Unified interface for loading files! We are now only support pdf, so loading pdf files from all kinds of file system.

Below is the full fsspec supported protocol.

It contains dropbox, google drive, S3, even jupyter & github!

['abfs',
 'adl',
 'arrow_hdfs',
 'asynclocal',
 'az',
 'blockcache',
 'box',
 'cached',
 'dask',
 'data',
 'dbfs',
 'dir',
 'dropbox',
 'dvc',
 'file',
 'filecache',
 'ftp',
 'gcs',
 'gdrive',
 'generic',
 'git',
 'github',
 'gs',
 'hdfs',
 'hf',
 'http',
 'https',
 'jlab',
 'jupyter',
 'lakefs',
 'libarchive',
 'local',
 'memory',
 'oci',
 'ocilake',
 'oss',
 'reference',
 'root',
 's3',
 's3a',
 'sftp',
 'simplecache',
 'smb',
 'ssh',
 'tar',
 'wandb',
 'webdav',
 'webhdfs',
 'zip']

Marker-Inc-Korea / AutoRAG

[Feature Request] Connect S3 as file loader #829