getzep / graphiti

Build and query dynamic, temporally-aware Knowledge Graphs
https://help.getzep.com/graphiti
Apache License 2.0
994 stars 42 forks source link

Invalidating Previous Nodes #139

Open fredngg opened 1 day ago

fredngg commented 1 day ago

My friend and I love the idea behind graphiti and was trying to test out how we can invalidate the facts using the episode functions, but each episode adds a new fact. The new episode may be invalidated from the beginning if we add a historical fact but it doesn't seem to look back at old facts (for the same entity) to invalidate them. Is this intended? Do we need to write our own logic to check old facts for the same entity to invalidate them. Otherwise, it doesn't return the most fitting response at the top.

Love to get your thoughts. Thanks!

Question - What is Nicholas drinking now *Correct answer - Coffee**

Nicholas is drinking green tea. r.invalid_at=None r.valid_at=datetime.datetime(2023, 9, 21, 10, 0, tzinfo=)

Nicholas is not drinking green tea at the moment. r.invalid_at=None r.valid_at=datetime.datetime(2024, 9, 21, 5, 32, 30, tzinfo=)

Nicholas started drinking coffee. r.invalid_at=None r.valid_at=datetime.datetime(2023, 9, 23, 0, 0, tzinfo=)

Nicholas stopped drinking green tea. r.invalid_at=None r.valid_at=datetime.datetime(2023, 9, 22, 0, 0, tzinfo=)

prasmussen15 commented 18 hours ago

Hey, thanks for the interest and questions!

First off, I would say that for the time being make sure to add a group_id to ingested episodes (they can all be the same like group_id='1'). We were having a bug where deduplication wasn't occurring between edges with nil group ids. I thought we had resolved the issue but it looks like it is still happening for nodes. We will have a bug fix for this in early next week though.

I think your question can be broken into two parts: (1) how do we invalidate facts? and (2) how do invalidated facts effect search?

For (1), we basically do a search on existing facts that serve as potential candidates for invalidation based on the new fact being added. If the LLM determines that the facts are in conflict with each other, then they will be invalidated based on timestamp-dependent logic. I ran through your examples and it looks like the LLM is determining that these statements aren't in contradiction with each other as they are temporally sequential. The invalidation prompt is something that we will improve over time with prompt engineering, and I could see us having some amount of custom invalidation logic in the future as use cases for that can be very different across domains.

For the time being, our perspective is that extracting the correct facts with the correct timestamps is the more important part to be consistent, as this allows us to store the information in a non-lossy way in the graph. This means that when the information is retrieved and passed to an LLM, it will be able to understand the timeline and answer questions accordingly.

For (2), we currently don't have a way to filter the search on things like the timestamps or other properties. This is something we have discussed internally and will be doing, but we want to make sure that we build the filter field in a safe ,flexible and extensible way. As such, we aren't actively working on the filtering at the time being with out focus on other high priority tasks.

Thanks again for the interest and let me know if you have any other questions!