Let's say I ask every day about the weather. This will result in the RAG retrieving all the other requests although they won't contribute to the current day's query, furthermore they may push out and suppress other useful retrievals.
We could computer classic similarity (Jaro-Winkler or edit-distance) and drop too similar ones
Exclude older hits by date
Unfortunately for the main ANN retrieval query these can still push out more useful hits from the scoop
Let's say I ask every day about the weather. This will result in the RAG retrieving all the other requests although they won't contribute to the current day's query, furthermore they may push out and suppress other useful retrievals.