dottxt-ai / outlines

Structured Text Generation
https://dottxt-ai.github.io/outlines/
Apache License 2.0
8.44k stars 426 forks source link

Add multi-label conditional choice generation example #233

Open davidberenstein1957 opened 1 year ago

davidberenstein1957 commented 1 year ago

I was working on creating a tutorial for adding computational feedback to our data labelling platform and noticed that In some situations, it might be useful to work on multi-label conditional choice generation.

I would love to tackle this in a PR if you feel this would be a nice addition.

davidberenstein1957 commented 1 year ago

Also, when my tutorial is done, it might be a nice applied example for your website on how to use it to work towards training a model?

davidberenstein1957 commented 1 year ago

I skimmed to the paper and realized that the regex-like generation will not work, but I played around with the json generation, which proved useful for this usecase. Do you think it makes sense to add an example to your readme via a PR?

Multiple choices (multi-label)

from pydantic import BaseModel

import outlines.models as models
import outlines.text.generate as generate

model = models.transformers("gpt2")

class Topic(BaseModel):
    new_card: bool = False
    mortgage: bool = False
    application: bool = False
    payments: bool = False

sequence = generate.json(model, Topic)("I want to a new card bank card at my bank")
# {
#   "new_card": true,
#   "mortgage": false,
#   "application": true,
#   "payments": false
# }
rlouf commented 1 year ago

Of course, any contribution that improves the documentation is greatly appreciated!

rlouf commented 1 year ago

@davidberenstein1957 do you need help on this?

davidberenstein1957 commented 1 year ago

Hi Remi,Programmatically, no. But un the sense of usability yes. I'll share a brief example later but the approach does not seem to work properly during emperic evaluation. Reading the paper, it might not be the correct approach. What do you think?

rlouf commented 12 months ago

Please share and I'll take a look!

chris-aeviator commented 6 months ago

I skimmed to the paper and realized that the regex-like generation will not work, but I played around with the json generation, which proved useful for this usecase. Do you think it makes sense to add an example to your readme via a PR?

Multiple choices (multi-label)

from pydantic import BaseModel

import outlines.models as models
import outlines.text.generate as generate

model = models.transformers("gpt2")

class Topic(BaseModel):
    new_card: bool = False
    mortgage: bool = False
    application: bool = False
    payments: bool = False

sequence = generate.json(model, Topic)("I want to a new card bank card at my bank")
# {
#   "new_card": true,
#   "mortgage": false,
#   "application": true,
#   "payments": false
# }

trying this reveals that the model does not care to provide an answer, it's possible it replies with all choices set to False. Can we mark it somehow required to answer at least with one option=