The underlying techniques are the same, but the goal is different. GPTcache provides a cache between an application and an LLM service. It uses the same principles of the semantic search example in this repo in order to generate embeddings for every user request to the model and store it in a vector database. If a new request is sent, then GPTcache performs a semantic search on the vector database to check if a similar request has been sent before. If it finds a result, then it will return the cached response instead of calling the LLM service again.
The underlying techniques are the same, but the goal is different. GPTcache provides a cache between an application and an LLM service. It uses the same principles of the semantic search example in this repo in order to generate embeddings for every user request to the model and store it in a vector database. If a new request is sent, then GPTcache performs a semantic search on the vector database to check if a similar request has been sent before. If it finds a result, then it will return the cached response instead of calling the LLM service again.
Does that answer your question?