Separation of Arguments and State

Mirascope / mirascope

LLM abstractions that aren't obstructions

https://docs.mirascope.io/

MIT License

684 stars 39 forks source link

Separation of Arguments and State #305

Closed willbakst closed 3 months ago

willbakst commented 3 months ago

Description

Right now, everything is state, including all template variables. For certain variables this makes sense. Things you want the consumer of the class to provide at construction time. But sometimes the expectation is that the consumer will create an instance of the class and make multiple calls, changing the state between calls. This feels somewhat wrong.

Consider the following example:

from mirascope.openai import OpenAICall

class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

    genre: str

recommender = BookRecommender(genre="fantasy")
response = recommender.call()
print(response.content)

In this case, genre shouldn't be state. If it were supposed to be state, we would use it more like this:

fantasy_recommender = BookRecommender(genre="fantasy")
response = fantasy_recommender.call()
print(response.content)

The idea here is that if genre is state then it likely won't be changed often (or at all) between calls.

But for use-cases where the consumer changes genre every call, this flow makes less sense. A better flow would be something like this:

recommender = BookRecommender()
response = recommender.call(genre="fantasy")
print(response.content)

Of course, the prompt_template may still rely on internal state, and that internal state should be able to access the current call arguments. This is necessary for chaining:

from mirascope.base import CallArgs
from mirascope.openai import OpenAICall

class BookRecommenderArgs(CallArgs):
    genre: str

class AuthorSelector(OpenAICall[BookRecommenderArgs]):
    prompt_template = "Recommend an author that writes {genre} books."

class BookRecommender(OpenAICall[BookRecommenderArgs]):
    prompt_template = "Recommend a book written by {author}."

    author_selector: AuthorSelector = AuthorSelector()

    @property
    def author(self) -> str:
        return self.author_selector.call(genre=self.call_args.genre)

recommender = BookRecommender()
response = recommender.call(genre="fantasy")
print(response.content)

The above is extremely rough in terms of the interface design, but the idea is there.

I want to be able to pass in arguments at call time that are arguments and not state and keep them separate from the state of the class.
I want things to be properly types so the consumer knows to pass in genre when calling call.
I want runtime validation on the call arguments, which we'll likely get anyway through our implementation
This should be possible for both BaseCall and BaseExtractor classes.

Not 100% how to best implement this yet. Likely some version of paramspec update with generics on the base and subclass call and other methods to accept the arguments + something like a call_args class variable that we can set at the very beginning of call so that when we access the properties they will have access to the correct call arguments. call_args could also then be set at definition time to provide default values

A benefit of doing this is that now users can choose between what is state vs. not state.

If they want history to be state and create an agent, great. If they want to create a separate agent class that passes history into separate call class as an argument, great. This provides maximum flexibility.

Furthermore, ideally the user would be able to better write an agent by overriding the call, stream etc. methods so that the expectation is to use these base methods and not some new method (unless the creator wants to write that new method, which they can). For example:

from mirascope.base import CallArgs
from mirascope.openai import OpenAICall, OpenAICallResponse

class MathematicianCallArgs(CallArgs)
    problem: str

class Mathematician(OpenAICall[MathematicianCallArgs]):
    prompt_template = """
    SYSTEM: You are the world's greatest mathematician.
    MESSAGES: {history}
    USER: Solve this following math problem: {problem}.
    """

    history: list[ChatCompletionMessageParam] = []

    def call(self, *, problem: str, **kwargs) -> OpenAICallResponse:
        response = super().call(problem=problem)
        self.history += [response.user_message_param, response.message]
        if tool := response.tool:
            output = tool.fn(**tool.args)
            self.history += tool.message_param(output)
            return self.call(problem=problem, **kwargs)
        else:
            self.history += response.history
            return response

The above now means that a consumer of Mathematician would be expected to use the call method rather than some custom method like solve_problem. To me this makes more sense. Again, not sure this is the right implementation of the interface, but it's in the right direction.

Note also that the agent example call internals has a lot of assumed other feature requests / bug fixes that are in the works, but that shouldn't affect this request.

jbbakst commented 3 months ago

Yah i think the main impetus for this feedback is:

if call's are meant to be created and used just once, then i think the current pattern works great. you construct a call with its args and then you call it
but considering that you want to do more agentic / stateful stuff, there should be a separation between what just gets passed into constructor args, internal state, and call args.
i'm currently building some more agentic stuff, and i've found myself implementing some fairly smelly patterns in order to work around this lack of separation (e.g. creating a new function, with its own args, that sets private state and then calls call internally). this pattern has felt gross to me bc 1/ it requires setting internal state for things that really aren't stateful, just so they get passed into the prompt template, and 2/ the call function is still part of the public interface of the class, and depending on whether or not you've called my wrapper function before, the call function will either fail (bc it's using empty-string initialized private vars) or it will "succeed" with whatever the last args were

jbbakst commented 3 months ago

Alternatively, you could push things towards a world where call's meant to be used once and discarded (therefore stateless), in which case the current pattern works. But I think the examples in your docs that leverage the fact that calls can be stateful should then change, and you'd probably want to come up with some new patterns that wrap calls and manage state.

So sort of depends on where, philosophically, you view call classes sitting in the longer-term vision for the library. Are agents / chatbots / more stateful things subclasses of calls? Or are they wrappers around calls?

willbakst commented 3 months ago

Inside of mirascope I want to make it really easy to implement an agent while maintaining flexibility and extensibility. I view this as first making it as easy as possible to implement an agent as a subclass of a call class in a way that doesn't smell. I think separating call arguments from state and enabling overwriting call, stream, call_async, and stream_async is the best way to do this that I can currently think of. Should also do this for extractors.

Thank you for the feedback and suggestions here!

Of course, this will result in a lot of repetitive code across agents if you as the end user need to define each of these methods for each agent. This is where I think something like BaseAgent with subclasses e.g. OpenAIAgent would come in as a second step. These classes would have methods like step/run or something that handle a lot of the duplicated code. The interface here has to maintain maximum extensibility / flexibility for writing more custom flows. I imagine something like a step being a single call that handles state update around history and exposes handlers to handle content and tools, and then just running the step in a loop until the agent says the loop is done.

The agent stuff definitely needs more thought and design work. Lots of examples to see how to provide the best interface.

I'm also interested in implementing more of an agent framework as a separate library built on top of mirascope with a lot of the same philosophy, except that library will have more fleshed out workflows like ReAct or Query Planning as pre-built execution flows (making them a little less extensible). Ideally this framework would be easy to build because BaseAgent classes provide a fantastic interface for easily implementing all the different agent workflows that pop up every day. This would also make it great for researches to implement new research using Mirascope so long as we provide detailed examples and great docs.

koxudaxi commented 3 months ago

@willbakst

I certainly feel it is essential to distinguish between the Arguments and the State.

I have thought about how to realize this idea.

Perhaps there is no problem with doing call argument validation in runtime.

However, we need to think a bit about static type checking.

I want things to be properly types so the consumer knows to pass in the genre when calling a call.

I certainly agree with this approach. The user should be able to understand explicitly from the IDE, etc., which arguments to pass. However, I don't think the existing Python specification allows for full static type checking to be applied to call method's signatures with your idea.

Not 100% how to best implement this yet... Likely some version of paramspec update with generics on base and subclass calls and ...

As you stated, it certainly feels like it could be achieved using ParamSpec and Concatenate. However, the current Python specification should not allow you to expand constructor signatures from Pydantic models. To explain the root cause a little more, it is impossible to expand parameters from a callable because We don't have the way such as ParametersOf as in the PEP of the ParamsSpec. ParametersOf was rejected in the PEP.

Currently, ParamSpec assumes a case where the arguments of the Callable are explicitly specified, such as decorators, so I don't think it can be used in this use case.

As I don't have time today, I will think about the CallArgs handover method again tomorrow.

koxudaxi commented 3 months ago

As I don't have time today, I will think about the CallArgs handover method again tomorrow.

I investigated and considered good approaches for defining types to realize your code example. At least, I couldn't find a good way to define types that work with Pyright. Even if it could be defined in a very complex and tricky way, I don't think it would be possible to achieve completion and validation with all type checkers and IDEs, so I thought it might be better to implement it a bit more explicitly. After all, I felt it was not intuitive because there is no definition of the call signature in the code. (Pydantic also uses magic in the signature of __init__, but features like dataclass and attrs, as well as similar dataclass functionality in Kotlin or other languages, are provided, and there is understanding among developers, and support from IDEs and type checkers, so it does not seem to be a problem.)

Supplement: In my understanding, Pyright supports fairly up-to-date PEPs. It also handles complex type analysis. (Last month, I wrote a decorator using ParamSpec and Concatenate, but type-checking didn't work correctly with mypy.) Also, since many projects still use mypy, even if it could be expressed using complex Generics type definitions utilizing the latest typing, I thought it wouldn't be checked correctly in many users' environments. Regarding IDEs, I also thought the same problem would occur due to the differences in the implementation of each LSP and PyCharm.

I considered using CallArgs as an argument for call() as a method to overcome the above issues. This method is quite simple, and the solution with Generics is clear. Users must pass a CallArgs instance to call(). Regarding Pydantic, there are no issues with completion or type checking, and even without the support of editors or type checkers, the necessary callArgs can be confirmed, thus improving readability. The drawback is that it is necessary to import and create the corresponding CallArgs model each time, which might seem cumbersome, but I think it is within an acceptable range. It clarifies the necessary arguments, and I believe it will be seen positively by users.

What do you think about this idea?

Here is an example code:


from abc import ABC
from typing import Generic, TypeVar

from pydantic import BaseModel

class CallArgs(BaseModel):
    ...

CallArgsT = TypeVar("CallArgsT", bound=CallArgs)

class OpenAICall(BaseModel, Generic[CallArgsT], ABC):
    def call(self, *, call_args: CallArgsT, **kwargs) -> ...:
        ...

    @property
    def call_args(self) -> CallArgsT:
        ...

class BookRecommenderArgs(CallArgs):
    genre: str

class AuthorSelector(OpenAICall[BookRecommenderArgs]):
    prompt_template = "Recommend an author that writes {genre} books."

class BookRecommender(OpenAICall[BookRecommenderArgs]):
    prompt_template = "Recommend a book written by {author}."

    author_selector: AuthorSelector = AuthorSelector()

    @property
    def author(self) -> str:
        return self.author_selector.call(call_args=BookRecommenderArgs(genre=self.call_args.genre))

recommender = BookRecommender()
response = recommender.call(call_args=BookRecommenderArgs(genre="fantasy"))
print(response.content)

willbakst commented 3 months ago

If this is the best solution possible / we can come up with, I'm not necessarily opposed.

It achieves the separation of arguments and state. We just need to also make sure that the call arguments can be optional, and we need to make sure the user can override the call and other such methods, including even the return type.

However, I don't think this is ideal.

Could we implement it as a decorator to get the param spec? Something like:

from mirascope.base import call_args
from mirascope.openai import OpenAICall

@call_args(genre=str)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book"

recommender = BookRecommender()
response = recommender.call(genre="fantasy")
print(response.content)

willbakst commented 3 months ago

Realizing we'll also likely need to bubble the functionality all the way down to BasePrompt no matter what, so we should also think through what the flow for BasePrompt looks like.

Top of mind is something like this:

from mirascope.base import BasePrompt, call_args

@call_args(genre=str)
class BookRecommendationPrompt(BasePrompt):
    prompt_template = "Recommend a {genre} book"

prompt = BookRecommendationPrompt()
messages = prompt.messages(genre="fantasy")
print(messages)
#> [{"role": "user", "content": "Recommend a fantasy book"}]

koxudaxi commented 3 months ago

I agree with your opinion.
I have considered the feasibility of the new proposal.
Currently, I think it is difficult for this approach to work correctly with static type checking.
I apologize for not being able to realize your idea, but the current Python lacks sufficient support for complex type concepts.

Specifically, decorators for classes do not work correctly with type checkers like mypy.

To be more specific, since Python does not support Intersection, it is difficult to use decorators for classes to modify the arguments of the call method.
https://github.com/python/typing/issues/213

Additionally, it is challenging to dynamically generate method signatures from type information defined in arguments like @call_args(genre=str) for use in static type checking.
Even though pydantic.create_model() generates a Pydantic model from arguments, it does not provide type information usable for static analysis.
https://github.com/pydantic/pydantic/blob/d93a0460e89f4de67bae1af829c9d028b02e4964/pydantic/main.py#L1459-L1471

To solve this issue, I added a feature to the PyCharm Pydantic plugin to analyze create_model. I created this plugin because Pydantic constructors could not be type-checked.

With the introduction of PEP 681 (@dataclass_transform), static type analysis for datalike model constructors like Pydantic is finally supported in mypy and pyright (the PEP was proposed to support pyright and VSCode).
https://peps.python.org/pep-0681

I also considered defining call_args with TypedDict and unpacking it as the type hint for the call method, but this is also not supported.
https://github.com/python/typing/issues/1399

In other words, the only way to dynamically change method arguments while supporting type checkers is to use decorators on methods.
Here is an example. Although mypy and pyright will work correctly, there may not be many benefits for users when implementing it.

class BookRecommenderArgs(CallArgs):
    genre: str

class BookRecommender(OpenAICall):
    prompt_template = "Recommend a book written by {author}."

    @call_args(BookRecommenderArgs)
    def call(self, *args, **kwargs) -> ...:
        response = super().call(*args, **kwargs)
        # do something with response
        return response

BookRecommender().call(genre="fantasy", xyz=123)
# mypy main.py
# >>> ... error: Unexpected keyword argument "xyz" for "call" of "BookRecommender"  [call-arg]

Instead of this, it might be better to dynamically validate and map the signature of the user-defined call method within super().call(*args, **kwargs).

Realizing we'll also likely need to bubble the functionality all the way down to BasePrompt no matter what, so we should also think through what the flow for BasePrompt looks like.

I think this can also be implemented in the same way as OpenAICall. Are there any specific concerns?

It achieves the separation of arguments and state. We just need to also make sure that the call arguments can be optional, and we need to make sure the user can override the call and other such methods, including even the return type.

I have one concern with my proposal: how to distinguish between required and optional call_args. It seems challenging to differentiate between required and optional solely using generics.

willbakst commented 3 months ago

I think this can also be implemented in the same way as OpenAICall. Are there any specific concerns?

I just want to make sure as we design/implement this that we are mindful of making it possible for BasePrompt as well. Since BaseCall and BaseExtractor subclass BasePrompt, this will inevitably impact how we should implement things in the library.

We also need to make sure that we can separate between the call arguments provided by the user for the prompt template vs. the call parameters the user can provide through **kwargs that get passed into the API call to e.g. OpenAI.

Instead of this, it might be better to dynamically validate and map the signature of the user-defined call method within super().call(*args, **kwargs).

I think my key desires here are:

The user can override the call and other such methods as described.
There's a level of type hinting and editor support so that e.g. recommender.call(genre="fantasy") shows you that you need to provide genre.
Ideally some level of type hint for the user when overriding the method (with some form of mapping like you've mentioned?) Something like call(self, *, genre: str, **kwargs) and super().call(genre=genre, **kwargs) instead of just *args and **kwargs
Make it as easy as possible to provide call arguments without overriding every function.

For (4), I imagine something like this:

class BookRecommenderArgs(CallArgs):
    genre: str

@call_args(BookRecommenderArgs)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

recommender = BookRecommender()
# all of the below should fail with unexpected keyword arg "xyz"
recommender.call(genre="fantasy", xyz=123)
recommender.call_async(genre="fantasy", xyz=123)
recommender.stream(genre="fantasy", xyz=123)
recommender.stream_async(genre="fantasy", xyz=123)

But it sounds like you're saying this isn't possible because even though we could wrap every method inside of BookRecommender, the static type checker won't catch it?

I have one concern with my proposal: how to distinguish between required and optional call_args. It seems challenging to differentiate between required and optional solely using generics.

For call arguments here, I don't think there's really any reason to support "optional" so much as just optional through default. The reason is that the prompt template will be dependent on the call args, so we'll need some value to work with. The point is to require these call arguments for the user of the class. In this case, we should be able to just define fields with defaults? For example:

class BookRecommenderArgs(CallArgs):
    genre: str = "fantasy"

@call_args(BookRecommenderArgs)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

BookRecommender().call()  # genre is optional and defaults to "fantasy"

I apologize for not being able to realize your idea, but the current Python lacks sufficient support for complex type concepts.

Not your fault Python doesn't support it! Makes me wonder if we're thinking too boxed in to the current design. Maybe there's a different way we could approach this to make it possible with existing python?

For example, what if we just accepted the limitations of Python and implemented the feature without optimal static type checking (and add better type checking when possible re: issues you linked / maybe a VSCode + PyCharm extension for it?).

Consider the following:

class BookRecommenderArgs(CallArgs):
    genre: str

@call_args(BookRecommenderArgs)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

recommender = BookRecommender()
response = recommender.call(genre="fantasy")  # no type hints but fails if genre isn't provided
print(response.content)

We can make the above possible, it's just not great wrt. static type checking. Then, we can add better static type checking if/when Python makes it possible? I wouldn't suggest this if there seemed to be a better route, but unfortunately it seems like this is the best option?

A nice benefit of doing it this way is that overriding the call method should be easy, meaning that implemented methods that are staticly typed would be something like this:

@call_args(BookRecommenderArgs)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

    def call(self, *, genre: str, **kwargs) -> OpenAICallResponse:
        return super().call(genre=genre, **kwargs)

    ...

    def stream(self, *, genre: str, **kwargs) -> Generator[OpenAICallResponseChunk, None, None]:
        yield from super().stream(genre=genre, **kwargs)

    ...

Of course, this isn't ideal, but it's better I think than anything else so far?

koxudaxi commented 3 months ago

Thank you for your suggestions.

The user can override the call and other such methods as described.

I think this is not a problem.

There's a level of type hinting and editor support so that e.g. recommender.call(genre="fantasy") shows you that you need to provide genre.

As you understand, this is not supported due to the current limitations of Python. I will provide future solutions for this later.

Ideally some level of type hint for the user when overriding the method (with some form of mapping like you've mentioned?) Something like call(self, , genre: str, kwargs) and super().call(genre=genre, kwargs) instead of just args and **kwargs

This cannot provide completion or warnings in editors or type checkers. However, at runtime, we can dynamically check the method and signature during class definition, so it can be easily noticed when running locally with unittest, etc. At that time, appropriate correction information can be displayed to the user as an error message.

Make it as easy as possible to provide call arguments without overriding every function.

As you mentioned, this does not benefit from type checkers. It cannot be checked at runtime until call() is actually executed, which may result in lower usability during implementation and delay in discovering argument errors.

But it sounds like you're saying this isn't possible because even though we could wrap every method inside of BookRecommender, the static type checker won't catch it?

Yes, my concern is that unless users implement unittests locally with proper mocking of external API calls, it will be difficult to discover issues. We wouldn't want incidents after deploying to production.

In this case, we should be able to just define fields with defaults? For example:

I agree with this approach. I don't see any problems with it.

For example, what if we just accepted the limitations of Python and implemented the feature without optimal static type checking (and add better type checking when possible re: issues you linked / maybe a VSCode + PyCharm extension for it?).

In the end, our discussion highlighted the limitations of Python's existing type checking at the language level.

While hoping these issues will be resolved in the future, I propose some workarounds here (of course, creating a PEP to improve the language specification in parallel is ideal).

To cover everything, here are the conclusions:

IDE Completion

PyCharm
- Write a plugin for PyCharm
VS Code and other LSP integrated editors
- Needs to be supported at the LSP level
- Generally difficult (e.g., not possible with VS Code)

IDE Warnings

PyCharm
- Write a plugin for PyCharm
VS Code and other LSP integrated editors
- Write a plugin for flake8, etc.
- The best approach would be to provide a plugin if ruff supports plugins

Type Checker

pyright
- Cannot be supported (no plugin support)
mypy
- Can be supported by creating a plugin
- pydantic provides a mypy plugin

If ruff supports plugins, it would be the best option since it is used in many projects and I already provide the plugin for PyCharm. Although ruff does not support completion, it can issue warnings and is expected to provide type checking in the future, which aligns with our goals. (I heard from a member of the development team at PyCon US that ruff will be offering a type checker in a few months.)

But unfortunately it seems like this is the best option?

Of course, this isn't ideal, but it's better I think than anything else so far?

I agree with your proposal, but I have one concern.

Make it as easy as possible to provide call arguments without overriding every function.

As I mentioned earlier, I am a little worried about unexpected errors due to wrong arguments when calling.

willbakst commented 3 months ago

As I mentioned earlier, I am a little worried about unexpected errors due to wrong arguments when calling.

I agree. I think if we go down this path, the recommendation when using call args would be to do the following:

@call_args(BookRecommenderArgs)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

    def call(self, *, genre: str, **kwargs) -> OpenAICallResponse:
        return super().call(genre=genre, **kwargs)

    async call(self, *, genre: str, **kwargs) -> OpenAICallResponse:
        return await super().call_async(genre=genre, **kwargs)

    def stream(self, *, genre: str, **kwargs) -> Generator[OpenAICallResponseChunk, None, None]:
        yield from super().stream(genre=genre, **kwargs)

    async stream(self, *, genre: str, **kwargs) -> AsyncGenerator[OpenAICallResponseChunk, None]:
        async for chunk in super().stream_async(genre=genre, **kwargs):
            yield chunk

This provides type checker and runtime support requiring genre. However, it's a bit verbose / inconvenient to have to write this every single time I want to use call arguments.

With the above implementation, the following would still work regardless:

@call_args(BookRecommenderArgs)
class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

recommender = BookRecommender()
response = recommender.call(genre="fantasy")  # no type checking here since user didn't define it

Can you think of a way to provide a more convenience way of implementing the first snippet so we still get type checking but it's not as verbose / inconvenient to write for every call?

willbakst commented 3 months ago

Random thought: could we do something like ClassVar or PrivateAttr to make a field a call argument instead?

I'm thinking something like this could be really cool (but not possible??):

from mirascope.base import CallArg
from mirascope.openai import OpenAICall

class BookRecommender(OpenAICall):
    prompt_template = "Recommend a {genre} book."

    genre: CallArg[str]

recommender = BookRecommender()
response = recommender.call(genre="fantasy")
print(response.content)

Could we somehow then get type hints on the OpenAICall methods typed by all CallArg fields?

koxudaxi commented 3 months ago

Can you think of a way to provide a more convenient way of implementing the first snippet so we still get type checking but it's not as verbose / inconvenient to write for every call?

I'll think about if there's a good way to do this.

Could we somehow then get type hints on the OpenAICall methods typed by all CallArg fields?

This ultimately becomes the same type checking that Pydantic originally aimed to achieve. In other words, it is not possible to collect the types of specific attributes.

willbakst commented 3 months ago

This ultimately becomes the same type checking that Pydantic originally aimed to achieve. In other words, it is not possible to collect the types of specific attributes.

This is very unfortunate. Regardless, we should still try to implement this to the best of our ability given current Python typing limitations.

I'm not upset with the most recent iteration, and if we can find a good shorthand even better.

However, I'm thinking suggestions like #312 might lend themselves better to this desired outcome since we'll have functions that we can properly type given current limitations.

koxudaxi commented 3 months ago

I'll think about if there's a good way to do this.

I think It would be most appropriate to include usage and snippets in the documentation within the method from which it is inherited, as is the case with many libraries.

However, I'm thinking suggestions like https://github.com/Mirascope/mirascope/issues/312 might lend themselves better to this desired outcome since we'll have functions that we can properly type given current limitations.

This seems to be a good approach that works correctly from the type checker's point of view.

The only problem is that it defines a function but no actual processing. This can be a bit confusing for new users. However, this is a simple problem that can be solved with better documentation. Perhaps we need a warning if the actual processing is written within the function since we only use the function's argument and return types.

Also, in order to maintain state such as history, perhaps the response should be a class based on pydantic model that can store such information.

willbakst commented 3 months ago

I'm going to call this not possible and instead move work on the principles here to #322