HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"
https://arxiv.org/abs/2410.05779
MIT License
9.49k stars 1.17k forks source link

Speed to process 11MB of text into vector database #321

Open GaryDean opened 2 days ago

GaryDean commented 2 days ago

I am creating a vector database using this hardware:

Hardware:
    LENOVO_MT_82WK_BU_idea_FM_Legion Pro 5 16IRX8 32GB
    NVIDIA GeForce RTX 4070 8GB

Text data is as follows:

Data Files:
   Total # text files: 3,482 files
       Total filesize: 11,626,546 bytes
     Average filesize: 3,339. bytes
      Median filesize: 2,908 bytes
    Smallest filesize: 36 bytes
     Largest filesize: 245,026 bytes

I am using default models to process this data.

To process this vector database took ~32 hours.

Am I "holding it wrong"?

fatehss commented 8 hours ago

It took me many hours to process a csv of around 2mb... we are also experiencing this issue