Open paullewis opened 3 weeks ago
I am going to rely on @aomarks' work in @breadboard-ai/build
to guide me.
Should we do automatic type coercion, like allow "number" -> "string"?
I am thinking something like adding an InspectablePortType
with a method that takes another InspectablePortType
as argument and returns true if it can connect to it or false if it can not. This method will be directional: port a -> port b may be ok, but port b -> port not.
Should we do automatic type coercion, like allow "number" -> "string"?
My feeling is no, at least not initially. I could see it leading to a class of bugs that are hard to debug.
Can we use JSON schema validation for all runtime type checking?
And agree, no automatic coercion. I think nodes that want to accept many types and do their own conversion can (e.g. template). We could also make a coerce node, or a library of them, to let users coerce.
JSON schema validation
Yes, as much as possible.
I was thinking about this concept of "type breadth". Some types are very broad, like "unknown". Some types are very narrow, like "number".
There's a spectrum here, since the LLM Content can be broader (accept all types) and narrower (accept only some).
Wires can go from narrower to broader, but not the other way around.
Would love your thoughts on that.
I was thinking about this concept of "type breadth". Some types are very broad, like "unknown". Some types are very narrow, like "number".
Yep. I think JSON schema handles this out of the box, and it's compatible with how TypeScript does validation. A narrow value will validate with a compatible broader schema (e.g. 123
will validate against "string" | "number"
).
Our "unknown"
is expanded out to "null" | "boolean" | "object" | "array" | "number" | "integer" | "string"
; so any JSON serializable value.
So, I think there shouldn't be anything extra we need to handle. Standard JSON schema validation + good schemas should do all the work. I suspect if there's a case where some value should be able to flow to some port, but JSON schema says it can't, then that probably just means that the receiving port's schema is too strict.
The only quirk we have is that we deviated from JSON schema with behavior
tags.
The only quirk we have is that we deviated from JSON schema with
behavior
tags.
Mm, right. Do any behaviors affect validation?
If not, the only problem might be the JSON validator being upset about an unexpected behavior
keyword, in which case we can just pre-process the schemas to remove them before passing to validation.
If they do affect validation in some cases, we should think about how to encode the constraint with JSON schema, or if it's too special/weird, we could have a secondary validation step that's just for our extra concepts?
we should think about how to encode the constraint with JSON schema, or if it's too special/weird, we could have a secondary validation step that's just for our extra concepts?
That's probably the path forward, yup.
Oh also! This bug is about design-time validation. We do need a bug for runtime validation, too.
There's now anInspectablePort.type
property that returns an InspectablePortType
that allows you to check if you can connect a node to another type with canConnect
method.
https://breadboard-ai.github.io/breadboard/docs/inspector/graph/#ports
See if this works, and please let me know. I want to integrate it into the editor API as well, so that canAddEdge
or canChangeEdge
error out.
I focused on ad-hoc wires today, but I'm hoping to look at this tomorrow
Okay, so I've had a look at the data coming through to the graph and all the InspectablePort
s I get through have undefined
.type
properties. Is there something I need to call to get the data populated?
A quick copy-paste of what DevTools emits for one of our ports:
{
"name": "schema",
"title": "schema",
"configured": true,
"value": {
"type": "object",
"properties": {
"context": {
"type": "string",
"title": "out"
}
}
},
"star": false,
"edges": [],
"status": "connected",
"schema": {
"type": "object",
"behavior": [
"json-schema",
"ports-spec",
"config"
]
}
}
Oh. That's because I haven't connected it 🤦🏻
type
shine.I was going to land the first pass of this, but then I realized that it breaks for the specialists. That seems to be because the context in is an array of LLM Content...
{
"name": "in",
"title": "Context in",
"configured": false,
"star": false,
"type": {
"schema": {
"title": "Context in",
"description": "The source material for the worker",
"type": "array",
"items": {
"type": "object",
"behavior": [
"llm-content"
]
},
"examples": [
]
}
}
}
But the context out is presenting as a string:
{
"name": "out",
"title": "Context out",
"configured": false,
"star": false,
"type": {
"schema": {
"title": "Context out",
"type": "string"
}
}
}
I didn't want to land #1844 with that as the case because I figured, while correct, it would be frustrating.
Per an earlier in-person discussion, we should limit your ability to connect – say –
number
ports tostring
ports. FWIW I think we should incorporatebehavior
s andformat
s here, too.So, for example,
object
toobject
wouldn't work if it wereobject -> LLM Content -> Any
toobject -> [no behavior]
. But things could be interesting if you didobject -> LLM Content -> image-webcam
toobject -> LLM Content -> Any
, because the latter is a superset of the former.Same probably goes for
array
, where the itemtype
,behavior
(s) andformat
s should probably match.If we do this at the API levels I can do a follow-up on the UI side to prevent folks from accidentally trying to wire things that are incompatible.