ajndkr / lanarky

The web framework for building LLM microservices
https://lanarky.ajndkr.com/
MIT License
976 stars 74 forks source link

Fix async streaming callbacks when caching is enabled to return final answer. #115

Closed auxon closed 1 year ago

auxon commented 1 year ago

Description

Async streaming callbacks were not returning the final answer when langchain.llm_cache was set, and not streaming tokens because the answer was cached. This PR fixes the issue and updates some tests to check when caching is enabled, that the answer is returned by the callbacks.

Fixes # 133

Changelog:

auxon commented 1 year ago

Still working on this because if the cache is enabled, the full answer is returned even if tokens were streamed (due to cache miss). Need to figure out a good way to disable final answer output when tokens were streamed (cached missed) without introducing unwanted dependencies.

ajndkr commented 1 year ago

Still working on this because if the cache is enabled, the full answer is returned even if tokens were streamed (due to cache miss). Need to figure out a good way to disable final answer output when tokens were streamed (cached missed) without introducing unwanted dependencies.

@auxon maybe something simple like a flag? if on_new_llm_token method is hit, you toggle the flag. if the flag is true, you can disable the final answer.

auxon commented 1 year ago

Still working on this because if the cache is enabled, the full answer is returned even if tokens were streamed (due to cache miss). Need to figure out a good way to disable final answer output when tokens were streamed (cached missed) without introducing unwanted dependencies.

@auxon maybe something simple like a flag? if on_new_llm_token method is hit, you toggle the flag. if the flag is true, you can disable the final answer.

Yes, I think this is the simplest and perhaps the only way to really solve it without introducing unneccessary dependencies, so I will try that ASAP today.

ajndkr commented 1 year ago

Cool! No rush!

auxon commented 1 year ago

Ran into an issue trying to add a field to the base AsyncLanarkyCallback, which seems due to pydantic 1.10.12 (as it works fine with the latest pydantic. Here is a minimal repro:

from pydantic import BaseModel

class Base():
    pass

class Derived(Base, BaseModel):

    def __init__(self, **data):
        super().__init__(**data)
        print("Derived.__init__() called")
        self._myVar = True

    @property
    def myVar(self):
        return self._myVar

    @myVar.setter
    def myVar(self, value):
        self._myVar = value

class SubDerived(Derived):
    pass

if __name__ == "__main__":
    v = SubDerived()
    print(v.myVar)

If you use pydantic 1.10.12 which Langchain and Lanarky use now you get this exception:

Exception has occurred: ValueError       (note: full exception trace is shown but execution is paused at: <module>)
"SubDerived" object has no field "_myVar"
  File "/Users/rah/repos/pythonTests/inheritanceTest.py", line 15, in __init__
    self._myVar = True
    ^^^^^^^^^^^
  File "/Users/rah/repos/pythonTests/inheritanceTest.py", line 31, in <module> (Current frame)
    v = SubDerived()
        ^^^^^^^^^^^^
ValueError: "SubDerived" object has no field "_myVar"
auxon commented 1 year ago

Fixed the issue trying to add a settable property by simply adding it like:

class AsyncLanarkyCallback(AsyncCallbackHandler, BaseModel):
    """Async Callback handler for FastAPI StreamingResponse."""

    llm_cache_used: bool = Field(default_factory=lambda: langchain.llm_cache is not None)
    ...

Unfortunately it took way to long to resolve. LOL

auxon commented 1 year ago

Closing this to make a new one.