Open spirali opened 11 months ago
@jas-ho @gavento Any opinion on this?
How about making this part of the language model (engine), e.g. as a wrapper? Pros:
query_engine
and other special methods.
Cons:Also, it would be nice to have a verification that only cached queries are made, and in the right order.
Checks that query match are already there.
I wanted to create a ReplayModel but it has some caveats:
query_model
anyway because otherwise the query is not stored in contextdo I understand correctly that
replay = Replay(context1)
) andwith context2: query_model(..., replay=replay)
)are two independent things @spirali?
returning to this point by @gavento
Can work in any context, not just query_engine and other special methods.
calls to query_model
are currently made in the following modules: simple_cot_actor
, one_shot_llm_actor
, json_examples
, query_for_json
, summarize
...and this will likely expand in the future.
What are your thoughts about making replay easily available consistently across all methods which rely on LLM calls @spirali?
PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run
Major thought why not: separation of concerns.
query_model
is now doing one thing: calling a variety of models with unified API and context logging. (The plan is to also get e.g. token limit to work as a parameter here, which is not trivial with LC models).
I am against adding more functionality to it - I think it is bad design, and the functionality does not seem core to me (or, at least, I am not convinced it is the right way to do it), and bloating one function for it seems bad design for any future.
calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize ...and this will likely expand in the future.
Prevalence of query_model
should IMO be irrelevant here - currently you can also call models directly (for any reason at all, including extra functionality of the call etc) and nothing will break (except for a missing log). With adding this to query_model
, that becomes a fragile bottle neck.
PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run
So how about adding it as a wrapper around LLMs individually?
I am rewriting replay functionality from scratch. I want to make this functionality completely generic (any function / method call can be replayed, not necessarily only LLMs). I have some PoC but I would like to sanity check of the idea before finishing it. Here is how it should look at the end:
def my_function(x, y):
return x + y
replay = Replay("my-replay-stream-name")
# Storing replay
with Context("root") as root:
replay(my_function)(1, 2)
# Using replay
replay = Replay("my-replay-stream-name", context=root)
with Context("root") as root:
replay(my_function)(2, 3) # Call my_function as it is not in cache
replay(my_function)(1, 2) # Recalls from cache
replay(my_function)(1, 2) # Call my_function as it is not in cache (2nd call)
# Wrapping any object, in replay log is stored also "self"
from langchain.llms import OpenAI
llm = replay(OpenAI(...))
llm.generate(["Tell me a joke."])
# Partial matching
replay(my_function, ignore_args=["y"])(2, 3) # Ignores argument "y" when trying to find matching call in cache
Any comments?
Thanks!
Three comments:
Why as a decorator/wrapper function? This way all code needs to be wrapped in replay
, including internal querying, so either we use it everywhere, or you pass the wrapped function instead of a LLModel (but then you lose any LLM information)
Storage logic: How does the storage/retrieval in contexts work?
Cache-misses and hits (esp. with repeated calls with same values) seem possibly inconsistent or illogical now. Way I think about it as a user
I also want to pitch games/situations with storable state (pickle is fine) and meaningful steps where the state can be stored/restored ;-)
No, you just pass the instance. Look at my example, I call there .generate()
on the wrapped instance without any additional wrapping, it automatically pass through arguments and methods. So you just wrap the instance when it is created and pass the instance in other cases. We can add some LangChainWrapper, I am not against it, but it will be just oneliner.
Right now, it always records calls. When you pass context to replay constructor, it uses stored info when necessary. It wanted to create it like this to avoid changed code when adding new code. I am still bit experimenting when it is actually stored.
I agree that we should offer a possibility to set a policy what to do with cache miss. What should be default needs to be discussed
Simple query caching
Fixes #40