codefuse-ai / ModelCache

A LLM semantic caching system aiming to enhance user experience by reducing response time via cached query-result pairs.
Other
892 stars 44 forks source link

Fix problems with deleting logs #40

Closed powerli2002 closed 6 months ago

powerli2002 commented 6 months ago

When I use MySQL and Mivuls as databases, failure information will be written to the modelcache_query_log table in MySQL after a cache miss, but this information does not get cleared with 'clear'. Additionally, the same issue exists with SQLite, which has also been addressed here. I have submitted a pull request to resolve this issue.

If the log information should not be cleared with 'clear', please tell me when I should clear this information during the program running. I haven't seen any code in the project to clear them.

Furthermore, according to the process flowchart you provided, after a cache miss, requests are made to LM, but the code does not make requests to LM; instead, it ends the query process after logging. Why is this the case? The relevant code snippet is as follows: flask4modelcache.py

if request_type == 'query':
    if response is None:
        result = {"errorCode": 0, "errorDesc": '', "cacheHit": False, "delta_time": delta_time, "hit_query": '',
            "answer": ''}

    delta_time_log = round(time.time() - start_time, 2)
    future = executor.submit(save_query_info, result, model, query, delta_time_log)
peng3307165 commented 6 months ago

why the query_log was not deleted: In our actual work, we want retain all log data to facilitate the investigation of historical data and provide data support for algorithm optimization. In open-source projects, I think delete logic is reasonable. You have carefully discovered this, and it deserves praise. The submission has been merged.

About the question of calling LLMs, in actual LLMs product applications, the calling process of LLMs involves many steps such as streaming output, security review, and problem troubleshooting. This process is also shown in modules in readme.md. Users want to avoid opaque processing of model calls by the caching system. Therefore, we adopt a decoupling strategy, leaving the calling of LLMs to the user. So, ModelCache can serve as a middleware to provide users with a more convenient user experience.

In the future, we will develop user adapters to further reduce user access costs and provide options to call large models. You can continue to follow our work and also welcome participation.