breadboard-ai / breadboard

A library for prototyping generative AI applications.
Apache License 2.0
84 stars 16 forks source link

[FR] Prevent wiring of incompatible types #1729

Open paullewis opened 3 weeks ago

paullewis commented 3 weeks ago

Per an earlier in-person discussion, we should limit your ability to connect – say – number ports to string ports. FWIW I think we should incorporate behaviors and formats here, too.

So, for example, object to object wouldn't work if it were object -> LLM Content -> Any to object -> [no behavior]. But things could be interesting if you did object -> LLM Content -> image-webcam to object -> LLM Content -> Any, because the latter is a superset of the former.

Same probably goes for array, where the item type, behavior(s) and formats should probably match.

If we do this at the API levels I can do a follow-up on the UI side to prevent folks from accidentally trying to wire things that are incompatible.

dglazkov commented 3 weeks ago

I am going to rely on @aomarks' work in @breadboard-ai/build to guide me.

dglazkov commented 3 weeks ago

Should we do automatic type coercion, like allow "number" -> "string"?

dglazkov commented 3 weeks ago

I am thinking something like adding an InspectablePortType with a method that takes another InspectablePortType as argument and returns true if it can connect to it or false if it can not. This method will be directional: port a -> port b may be ok, but port b -> port not.

paullewis commented 3 weeks ago

Should we do automatic type coercion, like allow "number" -> "string"?

My feeling is no, at least not initially. I could see it leading to a class of bugs that are hard to debug.

aomarks commented 3 weeks ago

Can we use JSON schema validation for all runtime type checking?

And agree, no automatic coercion. I think nodes that want to accept many types and do their own conversion can (e.g. template). We could also make a coerce node, or a library of them, to let users coerce.

dglazkov commented 3 weeks ago

JSON schema validation

Yes, as much as possible.

I was thinking about this concept of "type breadth". Some types are very broad, like "unknown". Some types are very narrow, like "number".

There's a spectrum here, since the LLM Content can be broader (accept all types) and narrower (accept only some).

Wires can go from narrower to broader, but not the other way around.

Would love your thoughts on that.

aomarks commented 3 weeks ago

I was thinking about this concept of "type breadth". Some types are very broad, like "unknown". Some types are very narrow, like "number".

Yep. I think JSON schema handles this out of the box, and it's compatible with how TypeScript does validation. A narrow value will validate with a compatible broader schema (e.g. 123 will validate against "string" | "number").

Our "unknown" is expanded out to "null" | "boolean" | "object" | "array" | "number" | "integer" | "string"; so any JSON serializable value.

So, I think there shouldn't be anything extra we need to handle. Standard JSON schema validation + good schemas should do all the work. I suspect if there's a case where some value should be able to flow to some port, but JSON schema says it can't, then that probably just means that the receiving port's schema is too strict.

dglazkov commented 3 weeks ago

The only quirk we have is that we deviated from JSON schema with behavior tags.

aomarks commented 3 weeks ago

The only quirk we have is that we deviated from JSON schema with behavior tags.

Mm, right. Do any behaviors affect validation?

If not, the only problem might be the JSON validator being upset about an unexpected behavior keyword, in which case we can just pre-process the schemas to remove them before passing to validation.

If they do affect validation in some cases, we should think about how to encode the constraint with JSON schema, or if it's too special/weird, we could have a secondary validation step that's just for our extra concepts?

dglazkov commented 3 weeks ago

we should think about how to encode the constraint with JSON schema, or if it's too special/weird, we could have a secondary validation step that's just for our extra concepts?

That's probably the path forward, yup.

Oh also! This bug is about design-time validation. We do need a bug for runtime validation, too.

dglazkov commented 2 weeks ago

There's now anInspectablePort.type property that returns an InspectablePortType that allows you to check if you can connect a node to another type with canConnect method.

https://breadboard-ai.github.io/breadboard/docs/inspector/graph/#ports

See if this works, and please let me know. I want to integrate it into the editor API as well, so that canAddEdge or canChangeEdge error out.

paullewis commented 2 weeks ago

I focused on ad-hoc wires today, but I'm hoping to look at this tomorrow

paullewis commented 2 weeks ago

Okay, so I've had a look at the data coming through to the graph and all the InspectablePorts I get through have undefined .type properties. Is there something I need to call to get the data populated?

paullewis commented 2 weeks ago

A quick copy-paste of what DevTools emits for one of our ports:

{
    "name": "schema",
    "title": "schema",
    "configured": true,
    "value": {
        "type": "object",
        "properties": {
            "context": {
                "type": "string",
                "title": "out"
            }
        }
    },
    "star": false,
    "edges": [],
    "status": "connected",
    "schema": {
        "type": "object",
        "behavior": [
            "json-schema",
            "ports-spec",
            "config"
        ]
    }
}
dglazkov commented 2 weeks ago

Oh. That's because I haven't connected it 🤦🏻

dglazkov commented 2 weeks ago

1838 should make type shine.

paullewis commented 2 weeks ago

I was going to land the first pass of this, but then I realized that it breaks for the specialists. That seems to be because the context in is an array of LLM Content...

{
    "name": "in",
    "title": "Context in",
    "configured": false,
    "star": false,
    "type": {
        "schema": {
            "title": "Context in",
            "description": "The source material for the worker",
            "type": "array",
            "items": {
                "type": "object",
                "behavior": [
                    "llm-content"
                ]
            },
            "examples": [
            ]
        }
    }
}

But the context out is presenting as a string:

{
    "name": "out",
    "title": "Context out",
    "configured": false,
    "star": false,
    "type": {
        "schema": {
            "title": "Context out",
            "type": "string"
        }
    }
}

I didn't want to land #1844 with that as the case because I figured, while correct, it would be frustrating.