Query replay - Githubissues

spirali commented 11 months ago

Simple query caching

Fixes #40

spirali commented 11 months ago

@jas-ho @gavento Any opinion on this?

gavento commented 11 months ago

How about making this part of the language model (engine), e.g. as a wrapper? Pros:

Can work in any context, not just query_engine and other special methods. Cons:
Needs to be done separately for each LLM.

Also, it would be nice to have a verification that only cached queries are made, and in the right order.

spirali commented 11 months ago

Checks that query match are already there.

I wanted to create a ReplayModel but it has some caveats:

We do not have standardized "config" across the models so it is not easy to directly pass the underlying configuration, without it you cannot use replayed queries again in replay. I think this is important when you create a replay, then use a replay and create some new queries and then use this new context again for replay.
You have to use query_model anyway because otherwise the query is not stored in context

jas-ho commented 11 months ago

do I understand correctly that

the context from which responses are taken in the replay (replay = Replay(context1)) and
the context in which I use the replay (with context2: query_model(..., replay=replay) )

are two independent things @spirali?

jas-ho commented 11 months ago

returning to this point by @gavento

Can work in any context, not just query_engine and other special methods.

calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize ...and this will likely expand in the future.

What are your thoughts about making replay easily available consistently across all methods which rely on LLM calls @spirali?

PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run

gavento commented 11 months ago

Major thought why not: separation of concerns.

query_model is now doing one thing: calling a variety of models with unified API and context logging. (The plan is to also get e.g. token limit to work as a parameter here, which is not trivial with LC models).

I am against adding more functionality to it - I think it is bad design, and the functionality does not seem core to me (or, at least, I am not convinced it is the right way to do it), and bloating one function for it seems bad design for any future.

calls to query_model are currently made in the following modules: simple_cot_actor, one_shot_llm_actor, json_examples, query_for_json, summarize ...and this will likely expand in the future.

Prevalence of query_model should IMO be irrelevant here - currently you can also call models directly (for any reason at all, including extra functionality of the call etc) and nothing will break (except for a missing log). With adding this to query_model, that becomes a fragile bottle neck.

PS: also wanted to say that I find the current implementation very clean and readable! just not sure what the best design will be in the longer run

So how about adding it as a wrapper around LLMs individually?

Easy to understand what is going on. Composable, and an independent feature (that we can even deprecate -- good point of every non-core design bit, and even there).
You can still feed it a Context object as a source of the logs
Tweakable (e.g. having slightly different behavior for different actors etc.)

spirali commented 11 months ago

I am rewriting replay functionality from scratch. I want to make this functionality completely generic (any function / method call can be replayed, not necessarily only LLMs). I have some PoC but I would like to sanity check of the idea before finishing it. Here is how it should look at the end:

def my_function(x, y):
   return x + y

replay = Replay("my-replay-stream-name")

# Storing replay

with Context("root") as root:
   replay(my_function)(1, 2)

# Using replay

replay = Replay("my-replay-stream-name", context=root)
with Context("root") as root:
   replay(my_function)(2, 3)  # Call my_function as it is not in cache
   replay(my_function)(1, 2)  # Recalls from cache
   replay(my_function)(1, 2)  # Call my_function as it is not in cache (2nd call)

# Wrapping any object, in replay log is stored also "self"

from langchain.llms import OpenAI

llm = replay(OpenAI(...))
llm.generate(["Tell me a joke."])

# Partial matching

replay(my_function, ignore_args=["y"])(2, 3)  # Ignores argument "y" when trying to find matching call in cache

Any comments?

gavento commented 11 months ago

Thanks!

Three comments:

Why as a decorator/wrapper function? This way all code needs to be wrapped in replay, including internal querying, so either we use it everywhere, or you pass the wrapped function instead of a LLModel (but then you lose any LLM information)
- Why not just have it as LangChain model wrapper class? (subclass of LLMBase that does pass-through on most attributes and non-query methods, maybe with some modifications), or or model class?
Storage logic: How does the storage/retrieval in contexts work?
- the demo seems a bit magic - maybe have explicit call to store vs retrieve it? E.g. I assume your code will silently load an empty replay if I mistype the replay name now.
- How are replays stored in non-root contexts? Are they also present in sub-contexts? etc.
- I assume that usually there will be only one replay in a context, so make the name optional?
- (or wild idea: have a leaf context for each replay, and thus show it in the UI tree?)
Cache-misses and hits (esp. with repeated calls with same values) seem possibly inconsistent or illogical now. Way I think about it as a user
- Either it offers "strict" replay, and then cache-miss before cache-hit should be an error, indicating non-deterministic replayed function (that should not fail silently!) (This is the primary way to present this, IMO)
- Or it offers "just caching" to the user and allows e.g. mixing cache-hits with cache-misses, but then repeated queries for the same value seem problematic anyway (what if I do an extra call during replay that would be uncached normally, but it coincides with some cached value that gets called multliple times? etc.)

gavento commented 11 months ago

I also want to pitch games/situations with storable state (pickle is fine) and meaningful steps where the state can be stored/restored ;-)

spirali commented 11 months ago

No, you just pass the instance. Look at my example, I call there .generate() on the wrapped instance without any additional wrapping, it automatically pass through arguments and methods. So you just wrap the instance when it is created and pass the instance in other cases. We can add some LangChainWrapper, I am not against it, but it will be just oneliner.
Right now, it always records calls. When you pass context to replay constructor, it uses stored info when necessary. It wanted to create it like this to avoid changed code when adding new code. I am still bit experimenting when it is actually stored.
I agree that we should offer a possibility to set a policy what to do with cache miss. What should be default needs to be discussed

acsresearch / interlab

Query replay #45