When I ask to summarize certain posts, especially the more recent ones that are not in the database, instead of saying - I don't know what that is. It just infers what the content would be by the title of the post and imagines an answer.
The example where this came up was the alignment forum post - A Primer On Chaos published on Mar 2023. It generated this answer:
It doesn't happen every time though. More often than not, it just says I don't know what that is.
Potential solution suggested by Bionic : We could hardcode that if it sees the word ‘summari[s/z]e’ and a url it adds the url as required metadata url in the search and uses a specific summarization prompt. And ditto for other similar things, like when a user says ‘in url, what…’, in could catch it and add the url as constraint in the search.
Generally, I agree with this approach. Maybe summarize and some other synonyms as well? Just to make sure we catch the intended behavior.
Mentioned in prompt engineering, we can (and should) encourage "I don't know"
Use the LLM (to catch synonyms) to generate pinecone metadata query (not only for summaries but also for authors, dates, sources), possibly using functions
If using regex as suggested but we'd need to be careful for questions like: "Summarize the differences in opinion between Eliezer Yudkowsky and Paul Christiano." Matching urls might be problematic for items cross-posted on different forums.
On a separate issues, I'm not even sure why the post from March 2023 isn't in the ARD. It should be.
When I ask to summarize certain posts, especially the more recent ones that are not in the database, instead of saying - I don't know what that is. It just infers what the content would be by the title of the post and imagines an answer.
The example where this came up was the alignment forum post - A Primer On Chaos published on Mar 2023. It generated this answer:
It doesn't happen every time though. More often than not, it just says I don't know what that is.
Potential solution suggested by Bionic : We could hardcode that if it sees the word ‘summari[s/z]e’ and a url it adds the url as required metadata url in the search and uses a specific summarization prompt. And ditto for other similar things, like when a user says ‘in url, what…’, in could catch it and add the url as constraint in the search.
Generally, I agree with this approach. Maybe summarize and some other synonyms as well? Just to make sure we catch the intended behavior.