getzep / graphiti

Build and query dynamic, temporally-aware Knowledge Graphs
https://help.getzep.com/graphiti
Apache License 2.0
1.36k stars 72 forks source link

Search returning empty list #112

Closed surajrav closed 1 month ago

surajrav commented 1 month ago

Setup Information

Python version: 3.12 graphiti-core version: 0.3.0 Neo4j version: 5.23.0 (using Neo4j Desktop for Mac) OS: Mac OS Sonoma 14.6.1 (Intel Processor)

Code and Usage Example

Code Snippet

...
from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType
from graphiti_core.utils.maintenance.graph_data_operations import clear_data
....

os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
graphiti = Graphiti(NEO4J_URI, NEO4j_USER, NEO4j_PASSWORD)

async def setup_graphiti():
    # CAREFUL: empty's the graph
    await clear_data(graphiti.driver)
    # Initialize the graph database with graphiti's indices. This only needs to be done once
    await graphiti.build_indices_and_constraints()

async def graphiti_ingest():
    await setup_graphiti()
    episodes = [
        "Kamala Harris is the Attorney General of California. She was previously "
        "the district attorney for San Francisco.",
        "As AG, Harris was in office from January 3, 2011 – January 3, 2017",
    ]
    for i, episode in enumerate(episodes):
        await graphiti.add_episode(
            name=f"Freakonomics Radio {i}",
            episode_body=episode,
            source=EpisodeType.text,
            source_description="podcast",
            reference_time=datetime.now()
        )

async def graphiti_search(query: str):
    results = await graphiti.search(query)
    print("\n".join([edge.fact for edge in results]))
    return results

if __name__ == "__main__":
    asyncio.run(graphiti_ingest())
    asyncio.run(graphiti_search("Who was the California Attorney General?"))

Output

(venv) surajravi@Surajs-MBP:~/Documents/code_projects/migration-rag/src » ./ai_ingest.py

Neo4j Graph Visualization

image

Problem Description

At first when search was using empty results for my own data I thought I was doing something wrong, so then I used the example data from the quickstart guide: https://help.getzep.com/graphiti/graphiti/quick-start

But I'm still getting empty output for the example data used in the quickstart guide so maybe there is something else up here.

Note that I do see my credits being billed each ingest run in my OpenAI billing console.

Maybe I'm missing something? If you could take a look and guide me that will be highly appreciated.

Please do let me know if I can provide any additional data.

paul-paliychuk commented 1 month ago

@surajrav Thanks for raising the issue, I was able to reproduce it locally too. It happens when no group_id is passed when adding episodes. We will push a patch for it soon.

As a quick fix, you can pass a group_id argument when adding the episodes and also specify it when searching graphiti. Group id is a developer provided string to describe an isolated region in the graph, we will be highlight its usage in the docs soon!

for i, episode in enumerate(episodes):
        await graphiti.add_episode(
            group_id='group1',
            name=f'Freakonomics Radio {i}',
            episode_body=episode,
            source=EpisodeType.text,
            source_description='podcast',
            reference_time=datetime.now(),
        )
...
results = await graphiti.search(query, group_ids=['group1'])
...
paul-paliychuk commented 1 month ago

@surajrav Fixed in v0.3.1

surajrav commented 1 month ago

@paul-paliychuk Confirm that it's fixed! Thank's for the quick turn-around! I upgraded to 0.3.1 and the example is now yielding the right results!

(venv) surajravi@Surajs-MBP:~/Documents/code_projects/migration-rag/src » ./ai_ingest.py
Kamala Harris is the Attorney General of California.
Kamala Harris was previously the district attorney for San Francisco.