[NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge across external documents. RAG + Knowledge Graphs + Personalized PageRank.
def chunk_corpus(corpus: list, chunk_size: int = 64) -> list:
"""
Chunk the corpus into smaller parts. Run the following command to download the required nltk data:
python -c "import nltk; nltk.download('punkt')"
@param corpus: the formatted corpus, see README.md
@param chunk_size: the size of each chunk, i.e., the number of words in each chunk
@return: chunked corpus, a list
"""
the default chunk_size is 64, is that the best practice? I tried with 150, and the entity count is the same as 64, but 10% more relationships were obtained.
def chunk_corpus(corpus: list, chunk_size: int = 64) -> list: """ Chunk the corpus into smaller parts. Run the following command to download the required nltk data: python -c "import nltk; nltk.download('punkt')"
the default chunk_size is 64, is that the best practice? I tried with 150, and the entity count is the same as 64, but 10% more relationships were obtained.