Open sahewat opened 1 year ago
Thank you for opening an issue! It looks like something we should be able to support. Do you mind pasting the JSON schema for these 3 models here?
In all cases, the schema is available through TaskWrapperxxx.model_json_schema()
. I'll post an example of each here as well.
TaskWrapperOptional
{
"$defs": {
"Task": {
"properties": {
"name": {
"title": "Name",
"type": "string"
},
"subtasks": {
"default": [],
"items": {
"$ref": "#/$defs/Task"
},
"title": "Subtasks",
"type": "array"
}
},
"required": [
"name"
],
"title": "Task",
"type": "object"
},
"TaskOptional": {
"properties": {
"subtask": {
"anyOf": [
{
"$ref": "#/$defs/Task"
},
{
"type": "null"
}
]
}
},
"required": [
"subtask"
],
"title": "TaskOptional",
"type": "object"
}
},
"properties": {
"task": {
"$ref": "#/$defs/TaskOptional"
}
},
"required": [
"task"
],
"title": "TaskWrapperOptional",
"type": "object"
}
TaskWrapperUnion
{
"$defs": {
"TaskUnion": {
"properties": {
"subtask": {
"anyOf": [
{
"$ref": "#/$defs/TaskUnion"
},
{
"type": "null"
}
]
}
},
"required": [
"subtask"
],
"title": "TaskUnion",
"type": "object"
}
},
"properties": {
"task": {
"$ref": "#/$defs/TaskUnion"
}
},
"required": [
"task"
],
"title": "TaskWrapperUnion",
"type": "object"
}
TaskWrapperList
{
"$defs": {
"TaskList": {
"properties": {
"subtask": {
"items": {
"$ref": "#/$defs/TaskList"
},
"title": "Subtask",
"type": "array"
}
},
"required": [
"subtask"
],
"title": "TaskList",
"type": "object"
}
},
"properties": {
"task": {
"$ref": "#/$defs/TaskList"
}
},
"required": [
"task"
],
"title": "TaskWrapperList",
"type": "object"
}
Anyone who is interested in this feature should know that CFG-structured generation is required to truly support it.
I saw there is a pull request that implement a beta version of CFG guided generation, which is amazing. But the request failed, is that everything thats necessary to get this functionality available? The PR failed due to a regression in a performance benchmark, i believe in a measurement that didnt really exist before, so is there anything that i can do to test/help get this recursive field functionaility over the line? I really need this recursive functionality, and am considering switching to a different structured generation library (https://github.com/noamgat/lm-format-enforcer), however that one seems like it is much less mature and i do intend to use this for production usecases.
Anyway, if there is anything i can do to help please let me know.
@hugocool To have stable CFG-based JSON generation we need
There may be other paths forward, but this is the approach immediately obvious to me.
This isn't an area of focus of mine at the moment, but if you're interested in tackling either issue, please let me know what questions you have!
Okay, i am willing to pick this up. So i have some questions, what is the current state of the Lark grammer generation? Which elements of the PR (https://github.com/lapp0/outlines/pull/85) can i build upon? or any attempt for that matter? Are you aware of lm-format-enforcer and their approach to solving this problem? What elements of their learnings should we incorporate?
I think i would start with generating a Lark grammer for my specific usecase, which is a specific recursive JSON model. Then if that works we can see how to generalize it so it can work for any arbitrary JSON schema. I am assuming i should build of of these examples:
Are there any more resources i should be aware off?
Lastly, i am assuming that the second issue you mentioned, would come into play once we would like to generalize the solution to JSON more broadly, right?
Recursive Pydantic definitions seem unsupported for lists, unions, and optionals. My understanding is these are the basic use cases.
A reproducible example is provided below:
I'd be interested in adding this functionality but I'm unsure as to what an "unrolled" recursive definition would look like in terms of the generated regex.