DAGWorks-Inc / burr

Build applications that make decisions (chatbots, agents, simulations, etc...). Monitor, trace, persist, and execute on your own infrastructure.
https://burr.dagworks.io
BSD 3-Clause Clear License
1.13k stars 59 forks source link

State typing #139

Open elijahbenizzy opened 5 months ago

elijahbenizzy commented 5 months ago

Currently state is untyped. Ideally this should be able to leverage a pydantic model, and possibly a typed dict/whatnot. We'll need the proper abstractions however.

Some requirements:

  1. IDE-friendly -- we should be able to use the typing system to get the state objects
  2. Subsetting -- we should be able to define state on a per-action basis (which can come instead of writes/reads potentially). Then we have different actions that can be compiled together. We should also be able to do this centrally.
  3. Validation -- we should be able to use this information to "validate" the graph
  4. Backwards-compatible/optional -- Burr does not support types currently. This should be backwards compatible, meaning that if no types are included we do not validate (or perhaps treat everything as Any, which is bidirectionally compatible with typing.
  5. Transactional updates -- this could be really hard if we rely on pydantic models for more than just the spec...
  6. Allows optionals (maybe?)

Ideas:

pydantic

class MyState(TypedState): # extends pydanticModel
    foo: int
    bar: str

@action(reads=["foo"], writes=["bar"])
def my_action(state: MyState) -> MyState:
    return {...}, state.udpate(bar=str(state["foo"]))

@action(reads=["foo"], writes=["bar"])
def my_action(state: MyState) -> MyState:
    return {...}, state.udpate(bar=["foo"]) # fails validation

- hard to do transactional updates + IDE integration is easy + easy integration with web-services/fastAPI ~ subsetting is a bit of work, but we can bypass that by using the whole state

in the decorator/class

@action(reads={"foo" : int}, writes=["bar" : str])
def my_action(state: State) -> State:
    return {...}, state.udpate(bar=str(state["foo"]))

- No free IDE integration (without a plugin) + simple, loosely coupled, easy to inspect ~ duplicated between readers and writers (can't decide if this is bad?)

zilto commented 3 weeks ago

Adding to the discussion, I think the "pydantic" and the "decorator/class" approaches could be dubbed "centralized" vs "decentralized" state model.

I'll be focusing on the benefits of "centralized" state model, which could be slightly different than the above "typed state" benefits. The simplest integration would be to subclass State and BaseModel to add some basic functionalities.

from pydantic import BaseModel
from burr.core import State

class BurrState(BaseModel, State):
  foo: int
  bar: str

Then, the model is passed to the ApplicationBuilder

app = (
  ApplicationBuilder()
  .with_actions(...)
  .with_transition(...)
  .with_state(model=BurrState())
  .with_entrypoint(...)
  .build()

Graph structure validation

We can ensure that all writes and reads field are present on the BurrState. If we also support "decentralized" type annotation on the @actions we could ensure that both match.

Default state

Instead of passing values field-by-field to the ApplicationBuilder via .with_state(), you can set default values on the BurrState or specify which fields are Optional or required before starting the application. Using Pydantic models also allows to subclass and nest models when required to manage complexity. You can also instantiate multiple objects for different configs (dev vs. prod, overrides, debugging) or test cases

Data validation

Pydantic has many validation functions including validate_assignment which could trigger validation of specific fields on state.update() or state.append(). Broadly speaking, it seems a reasonable development approach to manage state centrally and define "what's legal". As your application grows in complexity and the state machine goes brrrr, it's best to have "one source of truth" for validation

Integrations

Many LLM tools leverage Pydantic. For instance, a BurrState would allow to define schemas for LanceDB and automatically enable embedding fields for "agent" memory. Another interesting avenue is FastUI, which would allow to automatically build UI components for inputs and display of State fields.

elijahbenizzy commented 3 weeks ago

Adding to the discussion, I think the "pydantic" and the "decorator/class" approaches could be dubbed "centralized" vs "decentralized" state model.

I'll be focusing on the benefits of "centralized" state model, which could be slightly different than the above "typed state" benefits. The simplest integration would be to subclass State and BaseModel to add some basic functionalities.

from pydantic import BaseModel
from burr.core import State

class BurrState(BaseModel, State):
  foo: int
  bar: str

Then, the model is passed to the ApplicationBuilder

app = (
  ApplicationBuilder()
  .with_actions(...)
  .with_transition(...)
  .with_state(model=BurrState())
  .with_entrypoint(...)
  .build()

Graph structure validation

We can ensure that all writes and reads field are present on the BurrState. If we also support "decentralized" type annotation on the @actions we could ensure that both match.

Default state

Instead of passing values field-by-field to the ApplicationBuilder via .with_state(), you can set default values on the BurrState or specify which fields are Optional or required before starting the application. Using Pydantic models also allows to subclass and nest models when required to manage complexity. You can also instantiate multiple objects for different configs (dev vs. prod, overrides, debugging) or test cases

Data validation

Pydantic has many validation functions including validate_assignment which could trigger validation of specific fields on state.update() or state.append(). Broadly speaking, it seems a reasonable development approach to manage state centrally and define "what's legal". As your application grows in complexity and the state machine goes brrrr, it's best to have "one source of truth" for validation

Integrations

Many LLM tools leverage Pydantic. For instance, a BurrState would allow to define schemas for LanceDB and automatically enable embedding fields for "agent" memory. Another interesting avenue is FastUI, which would allow to automatically build UI components for inputs and display of State fields.

Good overview. Some other considerations:

  1. What does the State API look like? How much can we override pydantic? Usually Burr is immutable, but a pydantic model doesn't have to be that (necessarily)
  2. How does IDE integration work? There are (at least) two benefits to typed state -- one is generating/validating downstream consumers, and the other is making the fn implementation easier to reason with. The problem with central state is that we want to be able to type-check within the function. Pydantic specifically doesn't have a subset feature, so auto-completion in the IDE would likely require a custom plugin (might be feasible). Decentralized solves this. You might have a better idea of the API -- mind writing out a function or two using the "centralized" state?
  3. How about extending it? What if you want to add functions/build a version of the application that has a different state item? You'd have to make two changes, and have a conditional.
elijahbenizzy commented 2 weeks ago

OK, API decision, this is up next on implementation. Will support a few different ways to do it -- key is that it all compiles.

We support centralized and decentralized. Inputs are typed as normal.

Defining state types

As long as we have a spec of types, it's pretty easy:

# stdlib
OverallState = TypedState[{"a": int, "b": int, "c": int, "d": int}]
OverallState = TypedState[ABCDDataclass]

# with pydantic plugin
OverallState = TypedState[ABCDPydanticModel]

Defining actions

Then we can use:

@action(reads=["a", "b"], writes=["c", "d"])
def foo(state: OverallState) -> OverallState:
    pass

Note you can also define this anonymously. Probably going to require the reads/writes still, but if you think about it it's technically optional...

@action(reads=["a", "b"], writes=["c", "d"])
def foo(state: State[{"a": int, "b": int}]) -> State[{"c": int, "d": int}]:
    pass

integrating into app -- optional:

graph = GraphBuilder()....with_typing(TypedState) # or on the application builder

Note this will work with or without the above -- more likely one would do the other

burr.typing.get_type_dict(graph)
burr.typing.get_action_input_dict(graph, action)
burr.typing.get_action_state_input_dict(graph, action)
burr.typing.get_action_state_output_dict(graph, action)
b_pydantic.get_type_model(graph, exclude=..., include=...)
b_pydantic.get_action_input_model(graph, action)
b_pydantic.get_action_state_input_model(graph, action)
b_pydantic.get_action_state_output_model(graph, action)
elijahbenizzy commented 2 weeks ago
MyState = PydanticState[MyModel]

class MyState:
   substate: 

@action(reads=["a", "b"], writes=["a", "b"])
def my_action(state: MyState) -> MyState:
   state.c = fn(state.a, state.b)
   state.d =...
   return state

@action(reads=["a", "b"], writes=["c", "d"])
def my_action(state: MyState) -> MyState:
   state.model.c = fn(state.a, state.b)
   state.model.d =...
   return state

class PydanticState:
    a: Optional[AModel]
    b: Optional[BModel]

@action(reads=["a", "b"], writes=["c", "d"])
# the state model is the dynamically subsetted one
def my_action(state: PydanticState) -> PydanticState:
   state.model.c = fn(state.a, state.b)
   state.model.d = ... 
   state.model.e # throw an error (not allowed to read from e, doesn't exist/not declared)
   state.model.a = "foo" # throw an error, because you didn't declare it
   state.model.d =...
   return state

@action(reads=["*"], writes=[...])
# subset everything?
def my_action(state: PydanticState) -> PydanticState:
   # do the kitchen sink -- whatever you want with the whole state
   return state ...

## Idea -- give everything state
@action(reads="*", writes=["a", "b"])
def my_action(state: MyState) -> MyState:
   state.c = fn(state.a, state.b)
   state.d =...
   return state

Application[MyState]

builder.with_state(MyState).build()
application.state # IDE should know MyState
state, ... = application.run(...) #IDE should konw MyState
elijahbenizzy commented 5 days ago

See #350