Docs for cache behaviour

eth-sri / lmql

A language for constraint-guided and efficient LLM programming.

Apache License 2.0

3.48k stars 191 forks source link

I'm trying to understand how/when LLM calls get cached, especially when using the OpenAI API. I've looked in the docs, but can't find details.

Ideally, in development, I'd like to be able to cache/memoize calls to the API. For example, if one uses a LMQL programe which requests multiple completions, and changes the later part of the programme but leave the early phase unchanged. In this case it seems like the early requests to the API could be cached? This is especially the case if passing a seed which is now supported by the API.

eth-sri / lmql

Docs for cache behaviour #342